Re: Cron <root@haddock> /sbin/atactl /dev/wd0c smartstatus > /dev/null

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Cron <root@haddock> /sbin/atactl /dev/wd0c smartstatus > /dev/null

Han Boetes
Cron Daemon wrote:
> Segmentation fault (core dumped)

In that case I got a few more corefiles to inspect:

~% ls -l $(locate .core)
-rw------- 1 han wheel 664160 Dec  9 02:59 /var/cron/cron.core
-rw------- 1 han wheel 619128 Dec  9 04:59 /var/log/newsyslog.core
-rw------- 1 han wheel 217660 Dec  9 04:59 /var/log/sh.core

The newsyslog.core I mentionted in the previous email.

But these are a bit trickier, since I don't know what caused them
exactly.

/usr/src/bin/ksh% gdb ./ksh /var/log/sh.core
[snip: copyright notice]
warning: exec file is newer than core file.
Core was generated by `sh'.
Program terminated with signal 11, Segmentation fault.
#0  0x1c011fea in remove_job (j=0x89c5f508, where=0x3c0025e5 "child") at jobs.c:1520
1520            *prev = curr->next;
(gdb) bt
#0  0x1c011fea in remove_job (j=0x89c5f508, where=0x3c0025e5 "child") at jobs.c:1520
#1  0x1c0105ff in exchild (t=0x3c00b4a0, flags=0, close_fd=-1) at jobs.c:464
#2  0x1c00bcc3 in comexec (t=0x865c3148, tp=0x3c00b4c0, ap=0x853a244c, flags=0) at exec.c:665
#3  0x1c00b0f4 in execute (t=0x865c3148, flags=0) at exec.c:114
#4  0x1c016008 in shell (s=0x89c5f208, toplevel=1) at main.c:567
#5  0x1c015828 in main (argc=0, argv=0xcfbeea50) at main.c:378
(gdb) l
378             shell(s, true); /* doesn't return */
379             return 0;
380     }
381    
382     static void
383     init_username(void)
384     {
385             char *p;
386             struct tbl *vp = global("USER");
387    
(gdb)

/usr/src/usr.sbin/cron% gdb ./cron /var/cron/cron.core
[snip: copyright notice]
warning: exec file is newer than core file.
Core was generated by `cron'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libc.so.38.4...done.
Loaded symbols for /usr/lib/libc.so.38.4
Reading symbols from /usr/libexec/ld.so...done.
Loaded symbols for /usr/libexec/ld.so
#0  0x046c6bff in memset () from /usr/lib/libc.so.38.4
(gdb) bt
#0  0x046c6bff in memset () from /usr/lib/libc.so.38.4
#1  0x00000400 in ?? ()
#2  0x046bd607 in calloc () from /usr/lib/libc.so.38.4
#3  0x046bcf0e in alloc_segs () from /usr/lib/libc.so.38.4
#4  0x046bc062 in __hash_open () from /usr/lib/libc.so.38.4
#5  0x046a4336 in dbopen () from /usr/lib/libc.so.38.4
#6  0x0466f32d in getent () from /usr/lib/libc.so.38.4
#7  0x0466ebea in cgetent () from /usr/lib/libc.so.38.4
#8  0x0465f692 in login_getclass () from /usr/lib/libc.so.38.4
#9  0x1c004f1b in child_process (e=0x7ceb9000, u=0x86d780e0) at do_command.c:201
#10 0x1c00463d in do_command (e=0x7ceb9000, u=0x86d780e0) at do_command.c:55
#11 0x1c0045a6 in job_runqueue () at job.c:73
#12 0x1c0020b5 in main (argc=1, argv=0x3c004564) at cron.c:271
(gdb) l
271                     job_runqueue();
272
273                     /* Run any jobs in the at queue. */
274                     atrun(&at_database, batch_maxload,
275                         timeRunning * SECONDS_PER_MINUTE - GMToff);
276
277                     /* Check to see if we received a signal while running jobs. */
278                     if (got_sighup) {
279                             got_sighup = 0;
280                             log_close();
(gdb)


BTW how do I get the line-numbers in my traces?



# Han

Reply | Threaded
Open this post in threaded view
|

Re: Cron <root@haddock> /sbin/atactl /dev/wd0c smartstatus > /dev/null

Otto Moerbeek
On Fri, 9 Dec 2005, Han Boetes wrote:

> Cron Daemon wrote:
> > Segmentation fault (core dumped)
>
> In that case I got a few more corefiles to inspect:
>
> ~% ls -l $(locate .core)
> -rw------- 1 han wheel 664160 Dec  9 02:59 /var/cron/cron.core
> -rw------- 1 han wheel 619128 Dec  9 04:59 /var/log/newsyslog.core
> -rw------- 1 han wheel 217660 Dec  9 04:59 /var/log/sh.core
>
> The newsyslog.core I mentionted in the previous email.
>
> But these are a bit trickier, since I don't know what caused them
> exactly.
>
> /usr/src/bin/ksh% gdb ./ksh /var/log/sh.core
> [snip: copyright notice]
> warning: exec file is newer than core file.
> Core was generated by `sh'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x1c011fea in remove_job (j=0x89c5f508, where=0x3c0025e5 "child") at jobs.c:1520
> 1520            *prev = curr->next;
> (gdb) bt
> #0  0x1c011fea in remove_job (j=0x89c5f508, where=0x3c0025e5 "child") at jobs.c:1520
> #1  0x1c0105ff in exchild (t=0x3c00b4a0, flags=0, close_fd=-1) at jobs.c:464
> #2  0x1c00bcc3 in comexec (t=0x865c3148, tp=0x3c00b4c0, ap=0x853a244c, flags=0) at exec.c:665
> #3  0x1c00b0f4 in execute (t=0x865c3148, flags=0) at exec.c:114
> #4  0x1c016008 in shell (s=0x89c5f208, toplevel=1) at main.c:567
> #5  0x1c015828 in main (argc=0, argv=0xcfbeea50) at main.c:378
> (gdb) l
> 378             shell(s, true); /* doesn't return */
> 379             return 0;
> 380     }
> 381    
> 382     static void
> 383     init_username(void)
> 384     {
> 385             char *p;
> 386             struct tbl *vp = global("USER");
> 387    
> (gdb)
>
> /usr/src/usr.sbin/cron% gdb ./cron /var/cron/cron.core
> [snip: copyright notice]
> warning: exec file is newer than core file.
> Core was generated by `cron'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /usr/lib/libc.so.38.4...done.
> Loaded symbols for /usr/lib/libc.so.38.4
> Reading symbols from /usr/libexec/ld.so...done.
> Loaded symbols for /usr/libexec/ld.so
> #0  0x046c6bff in memset () from /usr/lib/libc.so.38.4
> (gdb) bt
> #0  0x046c6bff in memset () from /usr/lib/libc.so.38.4
> #1  0x00000400 in ?? ()
> #2  0x046bd607 in calloc () from /usr/lib/libc.so.38.4
> #3  0x046bcf0e in alloc_segs () from /usr/lib/libc.so.38.4
> #4  0x046bc062 in __hash_open () from /usr/lib/libc.so.38.4
> #5  0x046a4336 in dbopen () from /usr/lib/libc.so.38.4
> #6  0x0466f32d in getent () from /usr/lib/libc.so.38.4
> #7  0x0466ebea in cgetent () from /usr/lib/libc.so.38.4
> #8  0x0465f692 in login_getclass () from /usr/lib/libc.so.38.4
> #9  0x1c004f1b in child_process (e=0x7ceb9000, u=0x86d780e0) at do_command.c:201
> #10 0x1c00463d in do_command (e=0x7ceb9000, u=0x86d780e0) at do_command.c:55
> #11 0x1c0045a6 in job_runqueue () at job.c:73
> #12 0x1c0020b5 in main (argc=1, argv=0x3c004564) at cron.c:271
> (gdb) l
> 271                     job_runqueue();
> 272
> 273                     /* Run any jobs in the at queue. */
> 274                     atrun(&at_database, batch_maxload,
> 275                         timeRunning * SECONDS_PER_MINUTE - GMToff);
> 276
> 277                     /* Check to see if we received a signal while running jobs. */
> 278                     if (got_sighup) {
> 279                             got_sighup = 0;
> 280                             log_close();
> (gdb)
>
>
> BTW how do I get the line-numbers in my traces?

Take a more recent snap, those contains even more debug info for libraries.

The backtraces do not make a lot of sense... from your newsyslog core file:

#0  0x0be74bff in memset () from /usr/lib/libc.so.38.4
(gdb) up
#1  0x7c217160 in ?? ()
(gdb) up
#2  0x0be520c5 in newbuf (hashp=0x88e70000, addr=4294967295, prev_bp=0x0)
    at /usr/src/lib/libc/db/hash/hash_buf.c:185
    185                     memset(bp->page, 0xff, hashp->BSIZE);
    (gdb) print *bp
    $1 = {prev = 0x0, next = 0x0, ovfl = 0x0, addr = 0, page = 0x0, flags = 0 '\0'}
(gdb)

But bp->page has just been allocated, and this malloc is checked:


(gdb) list -4
178                     if ((bp = (BUFHEAD *)malloc(sizeof(BUFHEAD))) == NULL)
179                             return (NULL);
180                     memset(bp, 0xff, sizeof(BUFHEAD));
181                     if ((bp->page = (char *)malloc(hashp->BSIZE)) == NULL) {
182                             free(bp);
183                             return (NULL);
184                     }
185                     memset(bp->page, 0xff, hashp->BSIZE);
186                     if (hashp->nbufs)
187                             hashp->nbufs--;


So something is seriously corrupting memory, or the core dumps and
debug information do not match. Please make sure they are in sync.

So far you are the only person reporting these problems. It can be bad
hardware.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: Cron <root@haddock> /sbin/atactl /dev/wd0c smartstatus > /dev/null

Han Boetes
Otto Moerbeek wrote:
> > BTW how do I get the line-numbers in my traces?
>
> Take a more recent snap, those contains even more debug info for
> libraries.

OK, great.


> So something is seriously corrupting memory, or the core dumps
> and debug information do not match. Please make sure they are in
> sync.

Yes I checked if if recent changes have been commited before
examaning the dumpfile, and that the cores come from the same
snapshot.


> So far you are the only person reporting these problems. It can
> be bad hardware.

I considered the same, but it seems unlikely since the only time I
get these cores is while booting or while cron is running... I
regularly do md5-like checks on big files and lots of
compiles. Any memory-inconsistencies should show up before that.

And yes I am running GENERIC and recent snapshots.

Ow well, I'll simply do my best writing good bug-reports. We'll
see if something useful shows up.




# Han