Posts by marmot

21) Message boards : Number crunching : Invalid rate. (Message 268)
Posted 24 Apr 2019 by marmot
Post:
My Windows 7 Professional 64 with VBox 5.1.30 finished 397 valid, 0 invalid and 1 error (196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED).

Runs great on that machine, 100% invalid rate on the nearly identical hardware machines but with Windows 8.1.
22) Message boards : News : Request for feedback (Message 264)
Posted 22 Apr 2019 by marmot
Post:
So it's certainly not an issue with available RAM; freed up 10GB of 32GB.
Shut down 2 of the my custom 8 VM's that were running so there was plenty of free cores.

The WU's succeeded on another similar machine with Windows 7 Professional 64 (instead of 8.1) installed and Virtual Box 5.1.30.
They also succeeded on Windows 7 Professional 64 laptop with Virtual Box 5.1.28 (same that failed on Windows 8.1) that had very limited RAM available (the WU's caused the OS to swap the SSD vigorously).

What is different about the WU functioning under VBox on Windows 8.1 versus Windows 7?
23) Message boards : Number crunching : Invalid rate. (Message 263)
Posted 22 Apr 2019 by marmot
Post:
100% invalid rate on my two Windows 8.1 machines w/ BOINC 7.14.1 and VB 5.1.26 and 5.1.28 but 100% success rate (6/6) on my Windows 7 laptop with BOINC 7.8.3 and VBox 5.1.28. The other machine with Windows 7 has BOINC 7.14.1 with Virtual Box 5.1.30 and is completing 100% valid so far.

Seems to be an issue with Windows 8.1 as the host OS.

See: https://boinc.nanohub.org/nanoHUB_at_home/forum_thread.php?id=57&postid=261#261
24) Message boards : Number crunching : Incompatible with VirtualBox 5.2? (Message 262)
Posted 22 Apr 2019 by marmot
Post:
Rolling back VirtualBox is the easy part. You need Linux too.
https://boinc.nanohub.org/nanoHUB_at_home/results.php?hostid=816


Naw, Windows 7 laptop with VB 5.1.28r ran 6 of 6 WU just now.
25) Message boards : News : Request for feedback (Message 261)
Posted 22 Apr 2019 by marmot
Post:
I plan on aborting hundreds because the project sent down 1200 per machine.

Haven't gotten even 1 valid WU on either machine.
All invalid and about 1% errors.

The error occured once on each machine in the last 24 hours and is:
197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED.


One machine is running 2 WU concurrent has 127 invalids; the other machine runs 1 concurrent and has 47 invalid.

All appear to end successfully with:

2019-04-22 01:07:27 (3188): Guest Log: Running... 

2019-04-22 01:07:27 (3188): Guest Log: 07174842_032.boinc

2019-04-22 01:07:27 (3188): Guest Log: 07174842_032.sh

2019-04-22 01:09:04 (3188): Guest Log: boinc_app exited (0)

2019-04-22 01:09:04 (3188): Guest Log: Saving results...

2019-04-22 01:09:04 (3188): Guest Log: 07174842_032_output.tar.gz

2019-04-22 01:09:04 (3188): VM Completion File Detected.
2019-04-22 01:09:04 (3188): Powering off VM.
2019-04-22 01:09:06 (3188): Successfully stopped VM.



There are various warnings and errors throughout the log.
Early:
2019-04-22 01:02:55 (3188): Guest Log: BIOS: KBD: unsupported int 16h function 03

2019-04-22 01:02:55 (3188): Guest Log: BIOS: AX=0305 BX=0000 CX=0000 DX=0000 

2019-04-22 01:02:55 (3188): VM state change detected. (old = 'poweroff', new = 'running')
2019-04-22 01:03:05 (3188): Preference change detected
2019-04-22 01:03:05 (3188): Setting CPU throttle for VM. (100%)
2019-04-22 01:03:05 (3188): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 600 seconds))
2019-04-22 01:03:41 (3188): Guest Log: net.ipv4.ip_forward = 1

2019-04-22 01:03:41 (3188): Guest Log: sysctl: cannot stat /proc/sys/net/ipv6/conf/all/forwarding: No such file or directory

2019-04-22 01:03:41 (3188): Guest Log: sysctl: setting key "cannot stat %s": No such file or directory

2019-04-22 01:03:41 (3188): Guest Log: sysctl: "cannot stat %s" is an unknown key

2019-04-22 01:03:41 (3188): Guest Log: sysctl: setting key "cannot stat %s": No such file or directory

2019-04-22 01:03:41 (3188): Guest Log: Segmentation fault


This error happens 3/3:
2019-04-22 01:03:46 (3188): Guest Log: 00:00:00.061615 vminfo   Error: Unable to connect to system D-Bus (1/3): D-Bus not installed


Not sure what normal execution of the work app is but here's what logs show:

2019-04-22 01:03:52 (3188): Guest Log: -------------------

2019-04-22 01:03:52 (3188): Guest Log: Linking /etc/docker to /var/lib/boot2docker for persistence

2019-04-22 01:03:52 (3188): Guest Log: Waiting for Docker daemon to start...

2019-04-22 01:03:52 (3188): Guest Log: REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

2019-04-22 01:03:52 (3188): Guest Log: Running boinc_app...

2019-04-22 01:03:52 (3188): Guest Log: Importing Docker image from BOINC...

2019-04-22 01:03:52 (3188): Guest Log: Filesystem                Size      Used Available Use% Mounted on

2019-04-22 01:03:52 (3188): Guest Log: tmpfs                     2.6G    164.2M      2.5G   6% /

2019-04-22 01:03:52 (3188): Guest Log: tmpfs                     1.5G         0      1.5G   0% /dev/shm

2019-04-22 01:03:52 (3188): Guest Log: cgroup                    1.5G         0      1.5G   0% /sys/fs/cgroup

2019-04-22 01:03:52 (3188): Guest Log: shared                  407.9G    121.6G    286.3G  30% /root/shared

2019-04-22 01:03:52 (3188): Guest Log: tmpfs                     2.6G    164.2M      2.5G   6% /var/lib/docker/aufs

2019-04-22 01:03:57 (3188): Guest Log: 00:00:10.078023 vminfo   Error: Unable to connect to system D-Bus (3/3): D-Bus not installed

2019-04-22 01:04:58 (3188): Guest Log: doing docker load...

2019-04-22 01:06:00 (3188): Guest Log: Filesystem                Size      Used Available Use% Mounted on

2019-04-22 01:06:00 (3188): Guest Log: tmpfs                     2.6G    631.9M      2.0G  23% /

2019-04-22 01:06:00 (3188): Guest Log: tmpfs                     1.5G         0      1.5G   0% /dev/shm

2019-04-22 01:06:00 (3188): Guest Log: cgroup                    1.5G         0      1.5G   0% /sys/fs/cgroup

2019-04-22 01:06:00 (3188): Guest Log: shared                  407.9G    121.5G    286.4G  30% /root/shared

2019-04-22 01:06:00 (3188): Guest Log: tmpfs                     2.6G    631.9M      2.0G  23% /var/lib/docker/aufs

2019-04-22 01:06:00 (3188): Guest Log:               total        used        free      shared  buff/cache   available

2019-04-22 01:06:00 (3188): Guest Log: Mem:           3007          48        2289         631         669        2289

2019-04-22 01:06:00 (3188): Guest Log: Swap:           701           0         701

2019-04-22 01:06:00 (3188): Guest Log: Building apps directory...

2019-04-22 01:07:21 (3188): Guest Log: Prerun diagnostics...

2019-04-22 01:07:21 (3188): Guest Log: REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

2019-04-22 01:07:21 (3188): Guest Log: nanohub_apps_base   11                  42db2bf7db3d        16 months ago       458.1 MB

2019-04-22 01:07:26 (3188): Guest Log: CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

2019-04-22 01:07:26 (3188): Guest Log: 467.8M	/var/lib/docker

2019-04-22 01:07:27 (3188): Guest Log:               total        used        free      shared  buff/cache   available

2019-04-22 01:07:27 (3188): Guest Log: Mem:           3007          48        2287         631         671        2288

2019-04-22 01:07:27 (3188): Guest Log: Swap:           701           0         701

2019-04-22 01:07:27 (3188): Guest Log: Filesystem                Size      Used Available Use% Mounted on

2019-04-22 01:07:27 (3188): Guest Log: tmpfs                     2.6G    631.9M      2.0G  23% /

2019-04-22 01:07:27 (3188): Guest Log: tmpfs                     1.5G         0      1.5G   0% /dev/shm

2019-04-22 01:07:27 (3188): Guest Log: cgroup                    1.5G         0      1.5G   0% /sys/fs/cgroup

2019-04-22 01:07:27 (3188): Guest Log: shared                  407.9G    121.6G    286.3G  30% /root/shared

2019-04-22 01:07:27 (3188): Guest Log: tmpfs                     2.6G    631.9M      2.0G  23% /var/lib/docker/aufs

2019-04-22 01:07:27 (3188): Guest Log: Running... 

2019-04-22 01:07:27 (3188): Guest Log: 07174842_032.boinc

2019-04-22 01:07:27 (3188): Guest Log: 07174842_032.sh

2019-04-22 01:09:04 (3188): Guest Log: boinc_app exited (0)


Machines are running Windows 8.1. One machine has VB 5.1.26 and the other 5.1.28.
Both run my own custom VM's all day, every day, and have run Theory (40,000 hours), ATLAS, CMS and Cosmology VM's heavily.

They are running right up against their RAM limit and 99.97% core usage. I've seen heartbeat errors before but most all WU's are hardened against those and the only mention in the logs of heartbeat is:
2019-04-22 01:03:41 (3188): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds


I'll stop after 100 WUProps hours.
Sorry, I don't unhide my machines except in rare circumstances.

EDIT:

Additional info.

WU's did use a full core for their run. I didn't observe network or drive usage but will if that would be helpful.
One work unit did end in abort status and I had terminate the virtualbox service with a process manager because Virtual Box Manager would take commands but not perform the commands. It was the oddest behavior I'd ever seen from VBox.

These machines are actually downclocked because the weather is warming. Xeon's are staying at 55C and the RAM is under 35C.

EDIT2:
I'll assume this is a case of the VM unable to claim enough host virtual memory even though used RAM is below maximum by 200MB and commits are just 300mb above maximum real RAM. Shutting down management software, a lowering competing project to 1 WU as well as limiting nanoHUB to 1 WU.
My laptop with Windows 7, VB 5.1.28r just complete 7/7 WU successfully.


Previous 20


©2024 COPYRIGHT 2017-2018 NCN