Quite a few of my tasks end with 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

Message boards : Number crunching : Quite a few of my tasks end with 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Message board moderation

To post messages, you must log in.

AuthorMessage
TeeVeeEss

Send message
Joined: 15 Jan 17
Posts: 1
Credit: 1,317
RAC: 0
Message 135 - Posted: 17 Dec 2018, 11:32:14 UTC
Last modified: 17 Dec 2018, 11:38:51 UTC

Quite a few of my tasks running on a W10-host seem to end with an error indicating a exceeded time limit.
An example is https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=440412, host id 115
Can't see why though, any thoughts?

Edit: The last 2 seems ok.
ID: 135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 11 Jan 17
Posts: 65
Credit: 194,081
RAC: 209
Message 136 - Posted: 17 Dec 2018, 13:35:21 UTC - in response to Message 135.  
Last modified: 17 Dec 2018, 14:33:28 UTC

I have not seen that error message, but I have had several that apparently got stuck. I am running on an i7-3770 (Ubuntu 16.04), and one work unit now is at the 9 hour point; most finish in about 2 to 4 hours. Also, the Progress % is shown at 100% on BoincTasks, and the CPU% is now decreasing.
https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=441643

I will let it go for a few more hours, and then get rid of it. It may be the same thing you see, but without the error message.
ID: 136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 11 Jan 17
Posts: 65
Credit: 194,081
RAC: 209
Message 137 - Posted: 17 Dec 2018, 15:02:59 UTC - in response to Message 136.  

one work unit now is at the 9 hour point; most finish in about 2 to 4 hours.

I ended it at the 11 hour point, as the CPU% continued to drop. Clearly, it was not doing any work.
ID: 137 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 8 Feb 18
Posts: 6
Credit: 41,644
RAC: 0
Message 138 - Posted: 17 Dec 2018, 18:42:47 UTC

I've just started getting some of the Wu's to finish successfully today.
ID: 138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bcavnaugh
Avatar

Send message
Joined: 31 Mar 19
Posts: 1
Credit: 2,962
RAC: 0
Message 245 - Posted: 1 Apr 2019, 3:58:25 UTC

ID: 245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boco

Send message
Joined: 19 Jun 19
Posts: 1
Credit: 10,128
RAC: 12
Message 377 - Posted: 20 Jul 2019, 8:36:08 UTC

Maybe useful for debugging:

The WUs that fail for me (with time exceeded) throw a lot of tar errors upon booting and initialization (something along the lines of "cannot open file, file exists"). Obviously, something goes wrong with preparing the environment.

Additionally, these WUs will never print the final two lines following the "Running..." one. Healthy WUs have

{wuname}.boinc
{wuname}.sh

at the end. The faulty ones haven't. Work never starts.
ID: 377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 24 Apr 19
Posts: 51
Credit: 112,139
RAC: 33
Message 379 - Posted: 21 Jul 2019, 8:45:33 UTC - in response to Message 137.  

one work unit now is at the 9 hour point; most finish in about 2 to 4 hours.

I ended it at the 11 hour point, as the CPU% continued to drop. Clearly, it was not doing any work.


Those WUs running endlessly are p...g me right off. Letting those WUs run like that you loose valuable time which could be used to do crunch for other projects.

I hope this will be resolved soon.
ID: 379 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 7 Apr 17
Posts: 29
Credit: 14,343
RAC: 55
Message 443 - Posted: 7 Oct 2019, 20:37:35 UTC

Some of these wus
<message>
exceeded elapsed time limit 4194.04 (86400.00G/20.60G)</message>
ID: 443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber

Send message
Joined: 29 Dec 18
Posts: 7
Credit: 13,559
RAC: 41
Message 476 - Posted: 3 Nov 2019, 9:07:33 UTC
Last modified: 3 Nov 2019, 9:07:43 UTC

Since 30th of October I have produced 8 of these errors on my single contributing machine (ID 706). This machine otherwise produces error-free results in most cases (only these 8 errors plus 3 more which I have caused by plaiyng around with the machine settings) indicating that it is a systematic error not related to my machine. And yes: ALL of these errors have occurred on virtually all other machines trying to process these tasks:

https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5889752
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5870545
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5814681
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5920751
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5876333
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5876330
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5922819
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5871189

The tasks are being resend with a replication of 5 although the cause of the error seems clear: Somehow these tasks do not "converge to a result" and it appears to me that the project lead has (validly) decided to run these for around an hour and then abort them (safety measurement?).
This would actually be OK with me BUT THEN THESE TASKS REQUIRE RECEIVING CREDITS PROPORTIONAL TO THE CPU POWER INVESTED FOR AROUND THAT HOUR. Anything else makes no sense and is unfair, because with correctly runing tasks I can complete one approx. every 1-3 minutes and the cause of this error is unrelated to the machine hardware. These are valid runs which appear to indicate scientifically unfavorable starting conditions?

Michael.
President of Rechenkraft.net e.V. - Germany.
ID: 476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
surreal

Send message
Joined: 8 Oct 19
Posts: 5
Credit: 35,026
RAC: 335
Message 477 - Posted: 3 Nov 2019, 12:30:51 UTC - in response to Message 476.  

yeah i just had a bunch of them give errors after running for 50 minutes or so.

i ran memtest86+ for 3 hours afterwards just in case, but there were no ram faults detected

i just let them keep running for hours on end so that the underlying algorithm or mathematics can be adjusted later in the event that it's an actually useful error.
ID: 477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 7 Apr 17
Posts: 29
Credit: 14,343
RAC: 55
Message 481 - Posted: 3 Nov 2019, 22:14:44 UTC

Same here, a lot of "EXIT_DISK_LIMIT_EXCEEDED" errors.
At the beginning i think it's a problem of my pc, but seems it's not.
ID: 481 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber

Send message
Joined: 29 Dec 18
Posts: 7
Credit: 13,559
RAC: 41
Message 483 - Posted: 4 Nov 2019, 7:14:31 UTC - in response to Message 481.  
Last modified: 4 Nov 2019, 7:17:41 UTC

Same here, a lot of "EXIT_DISK_LIMIT_EXCEEDED" errors.
At the beginning i think it's a problem of my pc, but seems it's not.

The "196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED" is one of the two other errors I mentioned above. Since tasks of this error type were completed successfully by others, I did not go into further details. One thing, however, is clear: It is not caused by insufficient disk space on my HD. I checked the BOINC settings for max. harddisk usage and there were still >>10 GB free. I now re-adjusted the settings to allow for 100 GB disk usage, still I occasionally get the same error:

Errors on all machines:
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5910677
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5866696
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5913490

Error on my machine but successfully validated by another:
https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=7180261
https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=7180246

Error on my machine, currently under validation by another:
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5916623
https://boinc.nanohub.org/nanoHUB_at_home/result.php?resultid=7243981
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5930751

Michael.

P.S.: Another WU will soom cross the 1 hr runtime limit. I guess it will then give the "197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED" error...
President of Rechenkraft.net e.V. - Germany.
ID: 483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber

Send message
Joined: 29 Dec 18
Posts: 7
Credit: 13,559
RAC: 41
Message 484 - Posted: 4 Nov 2019, 7:51:45 UTC - in response to Message 483.  

P.S.: Another WU will soom cross the 1 hr runtime limit. I guess it will then give the "197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED" error...

As expected:
https://boinc.nanohub.org/nanoHUB_at_home/workunit.php?wuid=5781326

Michael.
President of Rechenkraft.net e.V. - Germany.
ID: 484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael H.W. Weber

Send message
Joined: 29 Dec 18
Posts: 7
Credit: 13,559
RAC: 41
Message 490 - Posted: 7 Nov 2019, 11:21:27 UTC
Last modified: 7 Nov 2019, 11:26:21 UTC

...in the meantime, around 67 faulty tasks have accumulated, of course consuming relevant compute ressources.

I find it quite problematic that, apparently, nobody of the project lead even comments on these issues. To my experience, error reports as those posted above are a very valuable (and free!) tool to improve a project.

Michael.
President of Rechenkraft.net e.V. - Germany.
ID: 490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom M

Send message
Joined: 27 Mar 20
Posts: 1
Credit: 0
RAC: 0
Message 551 - Posted: 27 Mar 2020, 11:47:20 UTC

Has this problem been replicated when running 90% of available Boinc cpus?
ID: 551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 24 Apr 19
Posts: 51
Credit: 112,139
RAC: 33
Message 553 - Posted: 29 Mar 2020, 10:02:54 UTC - in response to Message 551.  

Has this problem been replicated when running 90% of available Boinc cpus?


The project didn't splutter out any WUs in weeks. Without new work, no one can tell if problem still exists.
ID: 553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Quite a few of my tasks end with 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED


©2020 COPYRIGHT 2017-2018 NCN