Today's dtrace one-liners
Well, I've been spending a bit more time with dtrace lately. I'm now starting to do interesting things with dtrace without looking at the manual or snagging other scripts. Today I found some processes that looked like they weren't doing much of use. In particular, I saw the following in "prstat -mL"
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 15274 oracle 46 53 0.0 0.0 0.0 0.0 0.0 0.0 0 67 .22 0 racgmain/1 15295 oracle 47 53 0.0 0.0 0.0 0.0 0.0 0.0 0 74 .23 0 racgmain/1 16810 oracle 47 53 0.0 0.0 0.0 0.0 0.0 0.0 0 66 .23 0 racgmain/1 25029 oracle 47 53 0.0 0.0 0.0 0.0 0.0 0.0 0 70 .23 0 racgmain/1
First, it is confusing that it says that these processes are doing less than one system call per second (.22 or .23), but spending more than 50% in system calls. Looks like a bug in prstat. To see what system call was being used...
# dtrace -n 'syscall:::entry/execname == "racgmain"/ { @[probefunc] = count() } dtrace: description 'syscall:::entry' matched 232 probes ^C read 269039Ahh, so it is only reads that are causing all of this. Let's see which file descriptors are involved.
# dtrace -n 'syscall::read:entry/execname == "racgmain"/{ @[pid, probefunc,arg0] = count() }' ^C 22469 read 9 7 22469 read 4 7 22469 read 3 9 22469 read 8 9 22469 read 10 10 22469 read 7 21 22469 read 6 24 15274 read 10 1026462 15295 read 10 1044932 25029 read 10 1051450 16810 read 11 1060720
It seemed rather odd that there would be so much reading on a single file descriptor per file (10 or 11, depending on the process). If this were really the case, I would expect to see some processes showing up in prstat that were high in %sys doing writes. Using pfiles on each of the pids showed that the file descriptor in question was a FIFO, but gave no indication what (if anything) was on the other end of the FIFO.
To help answer the question of what might be on the other end of the FIFO, I tried this (ok, not a one liner...)
#!/usr/sbin/dtrace -qs syscall::read:return /execname == "racgmain"/ { @c[pid, arg0] = count() } tick-1s { printf("%6s %6s %4s\n", " pid ", " size ", "count"); printa("%6d %6d %@4d\n", @c); clear(@c); }
Which gave the following output...
pid size count 15295 0 117524 15274 0 121065 25029 0 121308 16810 0 125358 pid size count 15274 0 108177 25029 0 109639 15295 0 115735 16810 0 117175 pid size count 15295 0 119411 15274 0 119471 25029 0 120273 16810 0 120650
Imagine that... these processes are each doing 100,000+ zero-byte reads per second. Time to tell the DBA's that they have some real problems with one of their daemons.