Friday, April 20, 2007

Today's dtrace one-liners

Well, I've been spending a bit more time with dtrace lately. I'm now starting to do interesting things with dtrace without looking at the manual or snagging other scripts. Today I found some processes that looked like they weren't doing much of use. In particular, I saw the following in "prstat -mL"

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 
 15274 oracle    46  53 0.0 0.0 0.0 0.0 0.0 0.0   0  67 .22   0 racgmain/1
 15295 oracle    47  53 0.0 0.0 0.0 0.0 0.0 0.0   0  74 .23   0 racgmain/1
 16810 oracle    47  53 0.0 0.0 0.0 0.0 0.0 0.0   0  66 .23   0 racgmain/1
 25029 oracle    47  53 0.0 0.0 0.0 0.0 0.0 0.0   0  70 .23   0 racgmain/1

First, it is confusing that it says that these processes are doing less than one system call per second (.22 or .23), but spending more than 50% in system calls. Looks like a bug in prstat. To see what system call was being used...

# dtrace -n 'syscall:::entry/execname == "racgmain"/ { @[probefunc] = count() }
dtrace: description 'syscall:::entry' matched 232 probes
^C

  read                                                         269039
Ahh, so it is only reads that are causing all of this. Let's see which file descriptors are involved.
# dtrace -n 'syscall::read:entry/execname == "racgmain"/{ @[pid, probefunc,arg0] = count() }'
^C
    22469  read      9                7
    22469  read      4                7
    22469  read      3                9
    22469  read      8                9
    22469  read     10               10
    22469  read      7               21
    22469  read      6               24
    15274  read     10          1026462
    15295  read     10          1044932
    25029  read     10          1051450
    16810  read     11          1060720

It seemed rather odd that there would be so much reading on a single file descriptor per file (10 or 11, depending on the process). If this were really the case, I would expect to see some processes showing up in prstat that were high in %sys doing writes. Using pfiles on each of the pids showed that the file descriptor in question was a FIFO, but gave no indication what (if anything) was on the other end of the FIFO.

To help answer the question of what might be on the other end of the FIFO, I tried this (ok, not a one liner...)

#!/usr/sbin/dtrace -qs 

syscall::read:return
/execname == "racgmain"/
{
        @c[pid, arg0] = count()
}

tick-1s
{
        printf("%6s %6s %4s\n", " pid  ", " size ", "count");
        printa("%6d %6d %@4d\n", @c);
        clear(@c);
}

Which gave the following output...

 pid    size  count
 15295      0 117524
 15274      0 121065
 25029      0 121308
 16810      0 125358
 pid    size  count
 15274      0 108177
 25029      0 109639
 15295      0 115735
 16810      0 117175
 pid    size  count
 15295      0 119411
 15274      0 119471
 25029      0 120273
 16810      0 120650

Imagine that... these processes are each doing 100,000+ zero-byte reads per second. Time to tell the DBA's that they have some real problems with one of their daemons.

Wednesday, April 11, 2007

reproducible hang with ldom preview

This blog entry is because formatting is horribly broken at http://forum.java.sun.com/thread.jspa?threadID=5159568 where I originally posted it.

I configured a T2000 as described in the beginner's guide (http://www.sun.com/blueprints/0207/820-0832.pdf) with the exception of the device allocated for the root disk. For that I came up with my own variant of http://unixconsole.blogspot.com/2007/04/time-to-build-guest-domain.html.

My variant of using a file involved creating the file with mkfile on a zfs file system. That is...

zpool create zfs mirror c1t0d0s4 c1t1d0s4
zfs create zfs/ldoms
zfs set compress=on zfs/ldoms
mkfile 32G /zfs/ldoms/root.img
As I install Solaris in the ldom, the server (control domain) dies after extracting a few hundred megabytes of a flash archive. I have traced this down to it running out of memory. Here's "vmstat 4" output on the control domain console:
...
0 0 0 8636536 24776  1  70  0  0  0  0  0  0  0  0  0 1954  311 2244  0 23 76
0 0 0 8645096 22280  1  36  0  0  0  0  0  0  0  0  0 5957  370 9827  0 19 80
0 0 0 8649008 17720  1  40  0 296 313 0 44 0  0  0  0 9975  361 15877 0 24 75
0 0 0 8651104 15944  1  52  0 807 1671 0 700 0 0 0  0 10725 347 17545 0 26 74
0 0 0 8650800 18376  0  60  0 88 239 0 127 0  0  0  0 9816  391 15545 1 33 67
0 0 0 8640432 15936  0  76  0 497 3025 0 3874 0 0 0 0 11367 432 17975 0 35 65
0 0 0 8642968 17032  1  59  0 452 2028 0 842 0 0 0  0 10266 363 16127 0 27 73
kthr      memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m1   in   sy   cs us sy id
0 0 0 8644768 15744  0  56  0 387 1298 0 126 0 0 0  0 10170 330 16355 0 24 75
0 0 0 8652504 18368  1 113  0 372 2462 0 273 0 0 0  0 11171 321 18613 0 35 65
0 0 0 8652832 15720  1 134  0 411 6081 0 738 0 0 0  0 11541 332 18979 0 34 66
0 0 0 8652232 14312  1  94  0 413 1806 0 7775 0 0 0 0 10718 358 18271 0 38 62
0 0 0 8647360 12592 18 133  9 555 5176 0 17490 1 0 1 0 10394 320 16970 1 37 63
0 0 0 8645248 14408  2  73 22 486 5039 0 3111 2 1 1 0 11749 383 18336 0 40 59
2 0 43 8641800 2784  1 148 99 1070 1517 0 53982 19 9 9 5 8316 356 14226 0 43 57
0 0 116 8647032 800  1  42 127 134 312 3688 76207 14 7 7 1 2153 114 3726 0 29 71
At this point the server froze. Note that 116 processes were swapped and the "de" column is 3688. Very bad news.

My initial thoughts were that I was running into some of the low-memory problems known to happen with the ZFS arc. This does not seem to be the case. According to mdb, the arc size is around 60 MB:

# mdb unix.3 vmcore.3
...
> arc::print -td size
uint64_t size = 0t61455360
The control domain is S10 11/06 + 118833-36 + those required for ldoms + many others. The ldom is in the process of being installed is booted from a S10 11/06 netinstall image (118833-33).