hunting bugs in filebench
I've been using filebench a bit at work and decided that I would like to try a few things out at home. My home machine is not quite as beefy as the V40z's that I have been testing on at work. Getting filebench to compile in the first place is a bit of work. Probably works really well on someone else's system, but mine is obviously different. That's another story though. After compiling filebench, I ran it for the first time and saw this:
$ /opt/filebench/bin/filebench Segmentation fault (core dumped)Bummer. Well, let's see where that is at:
$ gdb /opt/filebench/bin/filebench core GNU gdb 6.4-debian . . . (gdb) where #0 0x37dd84aa in memset () from /lib/tls/i686/cmov/libc.so.6 #1 0x0807b01e in ?? () #2 0x080522da in ipc_init () at ipc.c:264 #3 0x08058bc1 in main (argc=1, argv=0x3f8fdcf4) at parser_gram.y:1140OK, so let's go with the assumption that the bug is in the code listed as alpha on the web site, and not libc. So we go up the stack a couple levels.
(gdb) up 2 #2 0x080522da in ipc_init () at ipc.c:264 264 memset(filebench_shm, 0, c2 - c1); (gdb) print filebench_shm $1 = (filebench_shm_t *) 0xffffffffHmmm... 0x with a bunch of f's looks like -1. Perhaps some system call on Solaris (presumably where filebench started) returns NULL on error and on Linux it returns -1. Let's go looking for that system call.
(gdb) list 259 #endif /* USE_PROCESS_MODEL */ 260 261 c1 = (caddr_t)filebench_shm;Nope, not there. Maybe a bit further up.262 c2 = (caddr_t)&filebench_shm->marker; 263 264 memset(filebench_shm, 0, c2 - c1); 265 filebench_shm->epoch = gethrtime(); 266 filebench_shm->debug_level = 2; 267 filebench_shm->string_ptr = &filebench_shm->strings[0]; 268 filebench_shm->shm_ptr = (char *)filebench_shm->shm_addr;
(gdb) list 250 245 #endif 246 247 if ((filebench_shm = (filebench_shm_t *)mmap(0, sizeof(filebench_shm_t), 248 PROT_READ | PROT_WRITE, 249 MAP_SHARED, shmfd, 0)) == NULL) { 250 filebench_log(LOG_FATAL, "Cannot mmap shm"); 251 exit(1); 252 } 253 254 #elseIt looks like mmap may be the culprit. I first asked man, but this is Linux, not Solaris. No man page for mmap! Next try google. Google comes up with this page that looks a lot like a man page. Why isn't that found on my system? Another thing for another day. Anyway, it says:
RETURN VALUEOk, so it is returning -1 because it doesn't like something. Let's see what it is trying to mmap:On success, mmap returns a pointer to the mapped area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately. On success, munmap returns 0, on failure -1, and errno is set (probably to EINVAL).
(gdb) print sizeof(filebench_shm_t) $2 = 907368000 (gdb) print sizeof(filebench_shm_t) / 1024 / 1024That 'splains it. It looks like it is trying to set up a shared memory segment that is 865 MB. My poor little system only has 512. FWIW, I have created a patch that addresses this one problem but I haven't had a chance to test it on Solaris yet. Unfortunately, with the patch, it just tells me that the mmap failed. It doesn't address the fact that it is trying to allocate a shared memory segment larger than the size of RAM on my system.$3 = 865 (gdb)
Update 1:
I have posted several patches to the bug tracking system at sourceforge.net. This particular one is 1432638. It turns out that mmap on Solaris also returns MAP_FAILED so the patch is simpler than I originally expected.
No comments:
Post a Comment