Posts by Michael H.W. Weber

1) Message boards : Number crunching : Quite a few of my tasks end with 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 490)
Posted 7 Nov 2019 by Michael H.W. Weber
Post:
...in the meantime, around 67 faulty tasks have accumulated, of course consuming relevant compute ressources.

I find it quite problematic that, apparently, nobody of the project lead even comments on these issues. To my experience, error reports as those posted above are a very valuable (and free!) tool to improve a project.

Michael.
2) Message boards : Number crunching : Quite a few of my tasks end with 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 484)
Posted 4 Nov 2019 by Michael H.W. Weber
Post:
P.S.: Another WU will soom cross the 1 hr runtime limit. I guess it will then give the "197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED" error...

As expected:
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5781326

Michael.
3) Message boards : Number crunching : Quite a few of my tasks end with 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 483)
Posted 4 Nov 2019 by Michael H.W. Weber
Post:
Same here, a lot of "EXIT_DISK_LIMIT_EXCEEDED" errors.
At the beginning i think it's a problem of my pc, but seems it's not.

The "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED" is one of the two other errors I mentioned above. Since tasks of this error type were completed successfully by others, I did not go into further details. One thing, however, is clear: It is not caused by insufficient disk space on my HD. I checked the BOINC settings for max. harddisk usage and there were still >>10 GB free. I now re-adjusted the settings to allow for 100 GB disk usage, still I occasionally get the same error:

Errors on all machines:
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5910677
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5866696
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5913490

Error on my machine but successfully validated by another:
https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=7180261
https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=7180246

Error on my machine, currently under validation by another:
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5916623
https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=7243981
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5930751

Michael.

P.S.: Another WU will soom cross the 1 hr runtime limit. I guess it will then give the "197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED" error...
4) Message boards : Number crunching : Quite a few of my tasks end with 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 476)
Posted 3 Nov 2019 by Michael H.W. Weber
Post:
Since 30th of October I have produced 8 of these errors on my single contributing machine (ID 706). This machine otherwise produces error-free results in most cases (only these 8 errors plus 3 more which I have caused by plaiyng around with the machine settings) indicating that it is a systematic error not related to my machine. And yes: ALL of these errors have occurred on virtually all other machines trying to process these tasks:

https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5889752
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5870545
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5814681
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5920751
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5876333
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5876330
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5922819
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5871189

The tasks are being resend with a replication of 5 although the cause of the error seems clear: Somehow these tasks do not "converge to a result" and it appears to me that the project lead has (validly) decided to run these for around an hour and then abort them (safety measurement?).
This would actually be OK with me BUT THEN THESE TASKS REQUIRE RECEIVING CREDITS PROPORTIONAL TO THE CPU POWER INVESTED FOR AROUND THAT HOUR. Anything else makes no sense and is unfair, because with correctly runing tasks I can complete one approx. every 1-3 minutes and the cause of this error is unrelated to the machine hardware. These are valid runs which appear to indicate scientifically unfavorable starting conditions?

Michael.
5) Message boards : Number crunching : How do I get my pc to run more than one task at a time? (Message 474)
Posted 30 Oct 2019 by Michael H.W. Weber
Post:
all the virtualbox instances on my machine require 2.25 to 2.6 gb of ram to run per core

i had to con the manager into running more instances of virtualbox by adjusting ram usage, and in most cases, can't get all 4 cores to run with 8gb of ram.

setting ram usage to 100% seems like a terrible idea most of the time because the virtual machines will over-allocate and cause massive swapfile thrashing that renders the system unusable and the cpu idles at 0% without any work being done. i've had to use 70% ram usage in the memory manager to get 2 cores to process WUs, and 85% for 3 cores. anything higher than about 86% causes 4 tasks to run and then they all just swap to death with no progress being gained.

maybe this prevents more people from using boinc, because any noob to this wouldn't think to do any of that; they would simply think 'sure use 100% of my ram, go ahead', and it results in 0% work being done in certain use-cases.

Correct.
See this one, too.

Michael.
6) Message boards : Number crunching : What is the relationship between nanohub.org and nanoHUB@Home? (Message 473)
Posted 30 Oct 2019 by Michael H.W. Weber
Post:
+1

Michael.
7) Questions and Answers : Wish list : Distinct applications instead of 'boinc2docker' (Message 472)
Posted 30 Oct 2019 by Michael H.W. Weber
Post:
On your website it reads as follows:


This project supports over 200 simulation tools deployed at nanoHUB.org. These tools are used for both education and research, and they enable simulations that can run in a few minutes or over several hours. The memory requirements can also vary widely, depending on the input parameters to the simulation.

When taking a look at the client_state.xml file, only ONE application name is defined. I am hereby asking you to re-organize the way you deploy apps. And for this request, I have a number of good reasons:

1. When first trying to contribute to this project, I realized that running more than one task on a Windows system with 8 GB of RAM will result in a reproducible freezing/bluescreen of the machine. Obviously, your DC project by-passes the BOINC-implemented memory management, which is not unusual for Virtualbox-based projects (such as our own, too).
Meaning in practice: People coming to your project using standard settings will not stay for long unless they have machines with non-standard consumer RAM equipment.
I solved this issue by allowing only 1 nanoHub task at a time - which, however, requires people to be firm with app_config. xml settings. Newbees won't manage that. Result: You loose valubale contributors / compute power.

2. The apps are declared as (mt), i.e. capable of multithreading. Well, I do not see that any of the apps is indeed mt-capable. Instead, each task tested uses at maximum a single core while the rest are idle. Still these tasks run around 1-2 minutes at maximum.
Meaning in practice: A massive waste of compute possibilities. Heavy data transfer frequency.

If you declare each of the (as you say) >200 apps as separate entities, it will become possible for users to fine tune settings for each of the apps - with respect to RAM, run times, combination with each other and with other BOINC projects, etc.
You could declare the maximum system requirements for each of these apps on your website such that people see which of the apps are e.g. RAM hungy and which are not. I for example prefer apps that run for long, so please also consider bundling of multiple tasks of the same type into one task (examples can be retrieved from other DC projects, e.g. Milkyway@home).

Michael.




©2024 COPYRIGHT 2017-2018 NCN