Request for feedback

Message boards : News : Request for feedback
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Hal Bregg

Send message
Joined: 24 Apr 19
Posts: 53
Credit: 114,639
RAC: 0
Message 305 - Posted: 20 May 2019, 9:26:37 UTC - in response to Message 204.  
Last modified: 20 May 2019, 9:29:38 UTC

While examining the BOINC job records we noticed that some jobs are failing with the exit code EXIT_ABORTED_VIA_GUI. If you have aborted a nanoHUB@Home job in your client, what was the reason?

In the same analysis we are working to improve the time and disk estimates for nanoHUB tools that produce WUs that fail often.


I cancelled some of WUs due to high failure rate caused by EXIT_DISK_LIMIT_EXCEEDED error.
ID: 305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 11 Jan 17
Posts: 99
Credit: 224,673
RAC: 0
Message 306 - Posted: 20 May 2019, 14:27:49 UTC - in response to Message 305.  

If it is just a question of a few more GB of memory, you could set up a different queue and allow users to select them. Many of us have 16 or 32 GB of memory, and it should be no problem. (On the other hand, there may be something wrong with the work units themselves, and maybe nothing will help.)
ID: 306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 20 Nov 18
Posts: 22
Credit: 14,909
RAC: 0
Message 307 - Posted: 20 May 2019, 14:42:20 UTC

This is just stupid.

20/05/2019 16:33:02 | nanoHUB_at_home | Aborting task 07219627_01_2: exceeded disk limit: 1105.91MB > 1024.00MB

In my last post I asked for a raise of the limit from 3072MB to 4096MB.

What do you do? You change it to 1024MB
ID: 307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 20 Nov 18
Posts: 22
Credit: 14,909
RAC: 0
Message 308 - Posted: 20 May 2019, 18:06:57 UTC

I did some deeper investigation in the results I have currently on my system.
I found that for the Boinc parameter "rsc_disk_bound", that sets the disk usage limit, you use a mix of 3 values: 1024MB, 2048MB and 3072MB.

Are results created by separate sources that each set there own disk usage limit?

If so, I suggest that you go to a fixed value for all results of at least 4096MB.
ID: 308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nonexistent_Admin
Volunteer moderator
Project administrator

Send message
Joined: 27 Sep 18
Posts: 58
Credit: 0
RAC: 0
Message 309 - Posted: 20 May 2019, 18:06:59 UTC - in response to Message 305.  

Yeah, we had a big problem with workunits hitting the disk limit last week. We added a few new nanoHUB simulation tools to this project, and two of the simulators are written using Matlab. Compiling the Matlab code pulls in a huge chunk of the Matlab runtime, and most of the WUs for those tools hit the disk limit and failed. We increased the disk requirement for those two tools, and I am watching WUs produced by those tools. Thanks for letting us know.
ID: 309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 20 Nov 18
Posts: 22
Credit: 14,909
RAC: 0
Message 310 - Posted: 21 May 2019, 9:05:25 UTC

Admin,

If I understand correctly what you are saying, you try to assign just the right amount of diskspace to a result.
There is no need for that, the settings goal is to act as a killswitch for faulty results.

Define at what amount of disk use the largest result is faulty and set that as the trigger value in the parameter for all results.
ID: 310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 20 Nov 18
Posts: 22
Credit: 14,909
RAC: 0
Message 311 - Posted: 21 May 2019, 12:35:03 UTC

Admin,

I just got a batch of new results. You are still using the to low setting of 3072MB

Do you think that you have to keep the setting als low as possible because it is used as a diskquota?
If you are, you are wrong. Like I said before it is a killswitch.

Diskquota is controlled by the user and set in the general Boinc-client preferences.
If a combined total of all projects/results uses more than that quota the user gets a message from Boinc.
The user can then either change the quota or stop running a project with high diskspace needs if he has not enough free diskspace.

So once again, please set the limit higher.
I am getting a bit upset that you ask for feedback but then don't take action when information is given.
ID: 311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 24 Apr 19
Posts: 53
Credit: 114,639
RAC: 0
Message 312 - Posted: 2 Jun 2019, 12:30:03 UTC

Hello,

Some of the tasks I crunch recently had to be aborted manually or finished with error.
Main reason was 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED and some of them were running for over an hour and even longer.
Beside those any other invalid tasks were failing due to infamous 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED.

Plus there are no more new tasks available again and workunits waiting for assimilation are piling up.

Please, give us update on the progress of the project.
ID: 312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nonexistent_Admin
Volunteer moderator
Project administrator

Send message
Joined: 27 Sep 18
Posts: 58
Credit: 0
RAC: 0
Message 315 - Posted: 3 Jun 2019, 18:50:53 UTC - in response to Message 312.  

Looking at the results from the weekend, 3% hit the time or disk limit, and 5% were aborted via the GUI. We are working to fix the problem preventing the creation of new WUs; once again, the problem is not with BOINC but with another nanoHUB machine on which all the memory is consumed by monitoring these BOINC jobs. We fixed most of the problems with that system but obviously not all of them. Plus, the two of us running this project have been busy teaching a short class for undergrads using nanoHUB for summer research; that ends today, so we will get this fixed ASAP. Thanks for letting us know.
ID: 315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 24 Apr 19
Posts: 53
Credit: 114,639
RAC: 0
Message 316 - Posted: 3 Jun 2019, 21:33:57 UTC - in response to Message 315.  

Looking at the results from the weekend, 3% hit the time or disk limit, and 5% were aborted via the GUI. We are working to fix the problem preventing the creation of new WUs; once again, the problem is not with BOINC but with another nanoHUB machine on which all the memory is consumed by monitoring these BOINC jobs. We fixed most of the problems with that system but obviously not all of them. Plus, the two of us running this project have been busy teaching a short class for undergrads using nanoHUB for summer research; that ends today, so we will get this fixed ASAP. Thanks for letting us know.


Thanks for the update and keep us informed about progress in the project.
ID: 316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Apr 19
Posts: 25
Credit: 12,699
RAC: 0
Message 321 - Posted: 6 Jun 2019, 4:48:38 UTC - in response to Message 289.  


The WU's succeeded on another similar machine with Windows 7 Professional 64 (instead of 8.1) installed and Virtual Box 5.1.30.

What is different about the WU functioning under VBox on Windows 8.1 versus Windows 7?


My Windows 7 Professional 64 Xeon E5-2660 machine with VBox 5.1.30 finished 397 valid, 0 invalid and 1 error (196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED).

Runs great on that machine, 100% invalid rate on the nearly identical hardware machines but with Windows 8.1.




Just wanted to add that the same Windows 8.1 machine with VBox 5.1.28 installed just successfully ran 227 valid boinc2docker work units received from BOINC@TACC.

Not sure what is different between their version and yours that would make yours fail on identical hardware that runs theirs without an issue.

EDIT: Wonder if that machine would run this boinc2docker if I reverted to VBox versions VBox versions Versions 5.1.12 – 5.1.18 or 5.1.22 - 5.1.26



Decided to bring this project's personal credit to 10,000 points and see if things are better.
Running on the same Windows 8.1 machine, with no changes to it's configuration, there were 6 completed, 0 invalids (last time near 100% invalids), 4 valid and 2 errors.

Here's the last lines from a valid log:

2019-06-05 05:01:05 (4152): Guest Log: Running...

2019-06-05 05:01:05 (4152): Guest Log: 07237451_073.boinc

2019-06-05 05:01:05 (4152): Guest Log: 07237451_073.sh

2019-06-05 06:35:50 (4152): Status Report: Elapsed Time: '6000.841696'
2019-06-05 06:35:50 (4152): Status Report: CPU Time: '5688.875000'
2019-06-05 08:11:45 (4152): Guest Log: boinc_app exited (0)

2019-06-05 08:11:45 (4152): Guest Log: Saving results...

2019-06-05 08:11:45 (4152): Guest Log: 07237451_073_output.tar.gz

2019-06-05 08:11:45 (4152): VM Completion File Detected.
2019-06-05 08:11:45 (4152): Powering off VM.
2019-06-05 08:11:47 (4152): Successfully stopped VM.
2019-06-05 08:11:52 (4152): Deregistering VM. (boinc_07810f23863218a0, slot#4)
2019-06-05 08:11:52 (4152): Removing virtual disk drive(s) from VM.
2019-06-05 08:11:52 (4152): Removing network bandwidth throttle group from VM.
2019-06-05 08:11:52 (4152): Removing storage controller(s) from VM.
2019-06-05 08:11:53 (4152): Removing VM from VirtualBox.
08:11:58 (4152): called boinc_finish(0)

</stderr_txt>


Here's the error messages from the error WU. This is likely one of the 2 VM's that were suspended and restarted as I dropped the total WU count from 4 to 2 as the machine ran out of free RAM.
2019-06-05 05:01:05 (4920): Guest Log: Running...

2019-06-05 05:01:05 (4920): Guest Log: 07237451_062.boinc

2019-06-05 05:01:05 (4920): Guest Log: 07237451_062.sh

2019-06-05 06:35:52 (4920): Status Report: Elapsed Time: '6002.564480'
2019-06-05 06:35:52 (4920): Status Report: CPU Time: '5697.343750'
2019-06-05 08:15:57 (4920): Status Report: Elapsed Time: '12007.088671'
2019-06-05 08:15:57 (4920): Status Report: CPU Time: '11344.687500'
2019-06-05 08:33:48 (4920): Powering off VM.
2019-06-05 08:33:50 (4920): Successfully stopped VM.
2019-06-05 08:33:55 (4920): Deregistering VM. (boinc_02b759967295c02a, slot#6)
2019-06-05 08:33:55 (4920): Removing virtual disk drive(s) from VM.
2019-06-05 08:33:56 (4920): Removing network bandwidth throttle group from VM.
2019-06-05 08:33:56 (4920): Removing storage controller(s) from VM.
2019-06-05 08:33:56 (4920): Removing VM from VirtualBox.

Hypervisor System Log:

07:20:09.673215 ERROR [COM]: aRC=VBOX_E_IPRT_ERROR (0x80bb0005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={SessionMachine} aText={Saved screenshot data is not available (VERR_NOT_SUPPORTED)}, preserve=false aResultDetail=0
07:20:17.723649 Load [C:\Program Files\Oracle\VirtualBox\ExtensionPacks\Oracle_VM_VirtualBox_Extension_Pack\win.amd64\VBoxHostWebcam.DLL] rc VINF_SUCCESS
07:20:17.838896 ERROR [COM]: aRC=VBOX_E_IPRT_ERROR (0x80bb0005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={SessionMachine} aText={Saved screenshot data is not available (VERR_NOT_SUPPORTED)}, preserve=false aResultDetail=0
10:25:47.301881 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:25:47.685702 ERROR [COM]: aRC=VBOX_E_INVALID_OBJECT_STATE (0x80bb0007) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={Medium 'C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso' is locked for reading by another task}, preserve=false aResultDetail=0
10:25:47.805829 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:47.810713 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:47.814619 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:47.820479 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:47.846846 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:47.898609 ERROR [COM]: aRC=VBOX_E_INVALID_OBJECT_STATE (0x80bb0007) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={Medium 'C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso' is locked for reading by another task}, preserve=false aResultDetail=0
10:25:47.965998 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:47.967951 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:49.134058 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={0169423f-46b4-cde9-91af-1e9d5b6cd945} aComponent={VirtualBoxWrap} aText={Could not find a registered machine with UUID {b5bb140c-5ce0-463d-aad1-d8144837dc89}}, preserve=false aResultDetail=0
10:25:49.364546 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={0169423f-46b4-cde9-91af-1e9d5b6cd945} aComponent={VirtualBoxWrap} aText={Could not find a registered machine with UUID {b5bb140c-5ce0-463d-aad1-d8144837dc89}}, preserve=false aResultDetail=0
10:25:49.692695 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:49.885094 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:25:50.563858 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={0169423f-46b4-cde9-91af-1e9d5b6cd945} aComponent={VirtualBoxWrap} aText={Could not find a registered machine with UUID {b5bb140c-5ce0-463d-aad1-d8144837dc89}}, preserve=false aResultDetail=0
10:25:50.868568 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:26:30.913693 ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={0169423f-46b4-cde9-91af-1e9d5b6cd945} aComponent={VirtualBoxWrap} aText={Could not find a registered machine named 'boinc_1220cb1aa3d7b8b7'}, preserve=false aResultDetail=0
10:26:31.236960 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.236960 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.244773 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.253563 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.253563 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.253563 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.279931 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.280907 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:31.286767 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:26:44.089527 ERROR [COM]: aRC=VBOX_E_INVALID_OBJECT_STATE (0x80bb0007) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={The given session is busy}, preserve=false aResultDetail=0
10:47:50.716886 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:47:51.325331 ERROR [COM]: aRC=VBOX_E_INVALID_OBJECT_STATE (0x80bb0007) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={Medium 'C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso' is locked for reading by another task}, preserve=false aResultDetail=0
10:47:51.362443 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:47:51.785328 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:47:51.952332 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0
10:47:52.086132 ERROR [COM]: aRC=VBOX_E_INVALID_OBJECT_STATE (0x80bb0007) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={Medium 'C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso' is locked for reading by another task}, preserve=false aResultDetail=0

VM Execution Log:
VM Startup Log:
VM Trace Log:

08:34:07 (4920): called boinc_finish(194)

</stderr_txt>



The machines don't take being suspended well?
ID: 321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Apr 19
Posts: 25
Credit: 12,699
RAC: 0
Message 330 - Posted: 7 Jun 2019, 11:36:20 UTC - in response to Message 321.  
Last modified: 7 Jun 2019, 11:47:28 UTC



Decided to bring this project's personal credit to 10,000 points and see if things are better.
Running on the same Windows 8.1 machine, with no changes to it's configuration, there were 6 completed, 0 invalids (last time near 100% invalids), 4 valid and 2 errors.



That machine w/ VBox ver 5.1.26 is now at 20 valid and 11 errors (used to get 100% invalids).
10:26:44.089527 ERROR [COM]: aRC=VBOX_E_INVALID_OBJECT_STATE (0x80bb0007) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={MachineWrap} aText={The given session is busy}, preserve=false aResultDetail=0
10:47:50.716886 ERROR [COM]: aRC=E_FAIL (0x80004005) aIID={b2547866-a0a1-4391-8b86-6952d82efaa0} aComponent={SessionMachine} aText={This machine does not have any snapshots}, preserve=false aResultDetail=0
10:47:51.325331 ERROR [COM]: aRC=VBOX_E_INVALID_OBJECT_STATE (0x80bb0007) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={Medium 'C:\Program Files\Oracle\VirtualBox\VBoxGuestAdditions.iso' is locked for reading by another task}, preserve=false aResultDetail=
0


The nearly identical machine in OS, updates, service configuration and hardware, w/ VBox ver 5.1.28 is getting 100% invalid results.

Both machines running other projects, including Boinc2Docker VM's (BOINC@TACC, Cosmology@Home) successfully.

The Windows 7 machine with similar hardware is running 100% valid results.
ID: 330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nonexistent_Admin
Volunteer moderator
Project administrator

Send message
Joined: 27 Sep 18
Posts: 58
Credit: 0
RAC: 0
Message 336 - Posted: 7 Jun 2019, 18:12:31 UTC - in response to Message 330.  

The Windows 7 machine with similar hardware is running 100% valid results.

For nanoHUB@home?
ID: 336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Apr 19
Posts: 25
Credit: 12,699
RAC: 0
Message 340 - Posted: 8 Jun 2019, 2:15:19 UTC - in response to Message 336.  
Last modified: 8 Jun 2019, 2:50:20 UTC

The Windows 7 machine with similar hardware is running 100% valid results.

For nanoHUB@home?


Yes, then, but now it has a few errors of the same kind as the other machine with no snapshots and locked files. About 150 valid to 5 errors ~ 3%.

It's running VBox 5.1.30.
If you think it will fix the issue, I can try VBox 5.1.30 on the other two machines.

My machines should be unhidden for you to see now.

The HP laptop with Windows 7 running VBox 5.1.28 also and it ran 100% valid of the first nanoHUB BOINC2Dockers it tried while the Win 8.1 machine with 5.1.28 gets 100% invalids.
ID: 340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 21 Apr 19
Posts: 25
Credit: 12,699
RAC: 0
Message 342 - Posted: 9 Jun 2019, 1:21:33 UTC

Analyzed your VM and saw it's OS resides on a 35MB CDROM image (impressive) and uses a shared folder to the hosts' BOINC slot directory for the app and data storage. All WU's appear to share the same OS and VBoxAdditions image.

If I move to 8 of these machines at once, the failure rate over locked files will climb, perhaps to over 50%.

Since the two images are under 100MB total, wouldn't it be easier to copy into each BOINC slot their own copy and eliminate the locked/missing image errors?
ID: 342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChristianVirtual

Send message
Joined: 1 Feb 19
Posts: 4
Credit: 8,935
RAC: 0
Message 343 - Posted: 10 Jun 2019, 11:25:03 UTC

I get "Postponed: VM Job unmanageable", "NS_SYSTEM_FAILURE" during start; 7000 WU and 8000 credits; huge disk space consumption .... Often I see runtimes of 200seconds and wonder: what the heck is it doing and can it be worth anything.... really struggle to get the meaning of this project (yes, sorry, I sound negative).

What can I do get it running better on my EPYC 24/48 with 128GB RAM and CentOS ?
ID: 343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nonexistent_Admin
Volunteer moderator
Project administrator

Send message
Joined: 27 Sep 18
Posts: 58
Credit: 0
RAC: 0
Message 344 - Posted: 10 Jun 2019, 20:10:57 UTC - in response to Message 343.  

Other BOINC projects have reported some successes for users who restart the client after seeing the "VM Job unmanageable" message. Restrict your client to 1 or 2 WUs at a time to conserve disk space.
ID: 344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChristianVirtual

Send message
Joined: 1 Feb 19
Posts: 4
Credit: 8,935
RAC: 0
Message 347 - Posted: 10 Jun 2019, 20:50:47 UTC

VBoxManage: error: The VM session was aborted
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component SessionMachine, interface ISession
Waiting for VM "boinc_0b0bb805c10c8b7e" to power on...

That one is the main reason, added an app_config with max 6 instances and try if it get better, also rebooted the box before. But this error actually is after the reboot so not very hopeful it will fix.

Any chance to dump the vbox experience and go native ?
ID: 347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChristianVirtual

Send message
Joined: 1 Feb 19
Posts: 4
Credit: 8,935
RAC: 0
Message 349 - Posted: 10 Jun 2019, 20:57:13 UTC - in response to Message 344.  

Other BOINC projects have reported some successes for users who restart the client after seeing the "VM Job unmanageable" message. Restrict your client to 1 or 2 WUs at a time to conserve disk space.

Disk space is not my main problem; but overall “compensation” with ~280PPD is ; in case of no failures.

And the fact that before my reboot the system load was in a way that some WCG WUs failed due to resource issues. Maybe because piling up VM zombies. Never had issues with that.

But that overall for me is a reason to “abort” tasks (as you asked in OP) it I experience negative impact on system when ever i try again.
ID: 349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nonexistent_Admin
Volunteer moderator
Project administrator

Send message
Joined: 27 Sep 18
Posts: 58
Credit: 0
RAC: 0
Message 351 - Posted: 11 Jun 2019, 13:38:35 UTC - in response to Message 347.  

Any chance to dump the vbox experience and go native

Not really. We have 209 different simulators supported by this project. We don't have the resources to maintain native versions of so many tools, especially for Windows.
ID: 351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : News : Request for feedback


©2024 COPYRIGHT 2017-2018 NCN