Wednesday, April 11, 2007

reproducible hang with ldom preview

This blog entry is because formatting is horribly broken at http://forum.java.sun.com/thread.jspa?threadID=5159568 where I originally posted it.

I configured a T2000 as described in the beginner's guide (http://www.sun.com/blueprints/0207/820-0832.pdf) with the exception of the device allocated for the root disk. For that I came up with my own variant of http://unixconsole.blogspot.com/2007/04/time-to-build-guest-domain.html.

My variant of using a file involved creating the file with mkfile on a zfs file system. That is...

zpool create zfs mirror c1t0d0s4 c1t1d0s4
zfs create zfs/ldoms
zfs set compress=on zfs/ldoms
mkfile 32G /zfs/ldoms/root.img
As I install Solaris in the ldom, the server (control domain) dies after extracting a few hundred megabytes of a flash archive. I have traced this down to it running out of memory. Here's "vmstat 4" output on the control domain console:
...
0 0 0 8636536 24776  1  70  0  0  0  0  0  0  0  0  0 1954  311 2244  0 23 76
0 0 0 8645096 22280  1  36  0  0  0  0  0  0  0  0  0 5957  370 9827  0 19 80
0 0 0 8649008 17720  1  40  0 296 313 0 44 0  0  0  0 9975  361 15877 0 24 75
0 0 0 8651104 15944  1  52  0 807 1671 0 700 0 0 0  0 10725 347 17545 0 26 74
0 0 0 8650800 18376  0  60  0 88 239 0 127 0  0  0  0 9816  391 15545 1 33 67
0 0 0 8640432 15936  0  76  0 497 3025 0 3874 0 0 0 0 11367 432 17975 0 35 65
0 0 0 8642968 17032  1  59  0 452 2028 0 842 0 0 0  0 10266 363 16127 0 27 73
kthr      memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m1   in   sy   cs us sy id
0 0 0 8644768 15744  0  56  0 387 1298 0 126 0 0 0  0 10170 330 16355 0 24 75
0 0 0 8652504 18368  1 113  0 372 2462 0 273 0 0 0  0 11171 321 18613 0 35 65
0 0 0 8652832 15720  1 134  0 411 6081 0 738 0 0 0  0 11541 332 18979 0 34 66
0 0 0 8652232 14312  1  94  0 413 1806 0 7775 0 0 0 0 10718 358 18271 0 38 62
0 0 0 8647360 12592 18 133  9 555 5176 0 17490 1 0 1 0 10394 320 16970 1 37 63
0 0 0 8645248 14408  2  73 22 486 5039 0 3111 2 1 1 0 11749 383 18336 0 40 59
2 0 43 8641800 2784  1 148 99 1070 1517 0 53982 19 9 9 5 8316 356 14226 0 43 57
0 0 116 8647032 800  1  42 127 134 312 3688 76207 14 7 7 1 2153 114 3726 0 29 71
At this point the server froze. Note that 116 processes were swapped and the "de" column is 3688. Very bad news.

My initial thoughts were that I was running into some of the low-memory problems known to happen with the ZFS arc. This does not seem to be the case. According to mdb, the arc size is around 60 MB:

# mdb unix.3 vmcore.3
...
> arc::print -td size
uint64_t size = 0t61455360
The control domain is S10 11/06 + 118833-36 + those required for ldoms + many others. The ldom is in the process of being installed is booted from a S10 11/06 netinstall image (118833-33).

No comments: