OpenBSD 6.8 GENERIC#5 i386
One of my systems rebooted at 03:01 local time today. I've seen kernel panics and bad hardware but I've never seen OpenBSD "just reboot" by itself, ever. There's no cron job that would do this. last(1) is no help; it shows the reboot command but not the shutdown that preceded it: root@ns ~ 4# last -f /var/log/wtmp.0 reboot ~ Sat Mar 27 03:01 root ttyp0 192.168.0.132 Wed Mar 24 11:23 - 11:23 (00:00) wtmp.0 begins Wed Mar 24 11:23 2021 root@ns ~ 5# last -f /var/log/wtmp.1 root ttyp0 192.168.0.132 Tue Mar 16 21:30 - 21:30 (00:00) root ttyp0 75.82.86.131 Tue Mar 16 13:14 - 21:30 (08:15) root ttyp0 75.82.86.131 Sun Mar 14 21:20 - 21:29 (00:08) root ttyp0 75.82.86.131 Sat Mar 13 17:42 - 21:13 (03:31) The date gaps seem odd. I've ssh'd into this system multiple times between March 16-27. I don't see other signs of trouble in /var/log. I could use some help in looking for evidence of foul play, or "just" a hardware or software problem. Thanks in advance for further troubleshooting clues. dn |
On 3/27/21 10:27 PM, David Newman wrote:
> OpenBSD 6.8 GENERIC#5 i386 > > One of my systems rebooted at 03:01 local time today. I've seen kernel > panics and bad hardware but I've never seen OpenBSD "just reboot" by > itself, ever. > > There's no cron job that would do this. last(1) is no help; it shows the > reboot command but not the shutdown that preceded it: > > root@ns ~ 4# last -f /var/log/wtmp.0 > reboot ~ Sat Mar 27 03:01 > root ttyp0 192.168.0.132 Wed Mar 24 11:23 - 11:23 > (00:00) > > wtmp.0 begins Wed Mar 24 11:23 2021 > root@ns ~ 5# last -f /var/log/wtmp.1 > root ttyp0 192.168.0.132 Tue Mar 16 21:30 - 21:30 > (00:00) > root ttyp0 75.82.86.131 Tue Mar 16 13:14 - 21:30 > (08:15) > root ttyp0 75.82.86.131 Sun Mar 14 21:20 - 21:29 > (00:08) > root ttyp0 75.82.86.131 Sat Mar 13 17:42 - 21:13 > (03:31) > > The date gaps seem odd. I've ssh'd into this system multiple times > between March 16-27. I don't see other signs of trouble in /var/log. > > I could use some help in looking for evidence of foul play, or "just" a > hardware or software problem. > > Thanks in advance for further troubleshooting clues. > > dn > problems on certain HP and Supermicro servers with hardware watchdogs. -- Kristjan Komloši |
On 3/28/21 4:58 AM, Kristjan Komloši wrote:
> On 3/27/21 10:27 PM, David Newman wrote: >> OpenBSD 6.8 GENERIC#5 i386 >> >> One of my systems rebooted at 03:01 local time today. I've seen kernel >> panics and bad hardware but I've never seen OpenBSD "just reboot" by >> itself, ever. >> >> There's no cron job that would do this. last(1) is no help; it shows the >> reboot command but not the shutdown that preceded it: >> >> root@ns ~ 4# last -f /var/log/wtmp.0 >> reboot ~ Sat Mar 27 03:01 >> root ttyp0 192.168.0.132 Wed Mar 24 11:23 - 11:23 >> (00:00) >> >> wtmp.0 begins Wed Mar 24 11:23 2021 >> root@ns ~ 5# last -f /var/log/wtmp.1 >> root ttyp0 192.168.0.132 Tue Mar 16 21:30 - 21:30 >> (00:00) >> root ttyp0 75.82.86.131 Tue Mar 16 13:14 - 21:30 >> (08:15) >> root ttyp0 75.82.86.131 Sun Mar 14 21:20 - 21:29 >> (00:08) >> root ttyp0 75.82.86.131 Sat Mar 13 17:42 - 21:13 >> (03:31) >> >> The date gaps seem odd. I've ssh'd into this system multiple times >> between March 16-27. I don't see other signs of trouble in /var/log. >> >> I could use some help in looking for evidence of foul play, or "just" a >> hardware or software problem. >> >> Thanks in advance for further troubleshooting clues. >> >> dn >> > What kind of a machine is it running on? I remember having reboot > problems on certain HP and Supermicro servers with hardware watchdogs. This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of a pair running CARP. Aside from having to replace spinning disks with SSDs a couple of years ago, they've been rock solid. I too have seen issues with Supermicros but that's with other OSs. I've never had a spontaneous reboot, on this system, and am concerned from the wtmp stuff above that this *may* have been triggered externally. I could use some clues in other things to check. Thanks. dn |
On 2021-03-28, David Newman <[hidden email]> wrote:
> On 3/28/21 4:58 AM, Kristjan Komloši wrote: > >> On 3/27/21 10:27 PM, David Newman wrote: >>> OpenBSD 6.8 GENERIC#5 i386 >>> >>> One of my systems rebooted at 03:01 local time today. I've seen kernel >>> panics and bad hardware but I've never seen OpenBSD "just reboot" by >>> itself, ever. >>> >>> There's no cron job that would do this. last(1) is no help; it shows the >>> reboot command but not the shutdown that preceded it: >>> >>> root@ns ~ 4# last -f /var/log/wtmp.0 >>> reboot ~ Sat Mar 27 03:01 >>> root ttyp0 192.168.0.132 Wed Mar 24 11:23 - 11:23 >>> (00:00) >>> >>> wtmp.0 begins Wed Mar 24 11:23 2021 >>> root@ns ~ 5# last -f /var/log/wtmp.1 >>> root ttyp0 192.168.0.132 Tue Mar 16 21:30 - 21:30 >>> (00:00) >>> root ttyp0 75.82.86.131 Tue Mar 16 13:14 - 21:30 >>> (08:15) >>> root ttyp0 75.82.86.131 Sun Mar 14 21:20 - 21:29 >>> (00:08) >>> root ttyp0 75.82.86.131 Sat Mar 13 17:42 - 21:13 >>> (03:31) >>> >>> The date gaps seem odd. I've ssh'd into this system multiple times >>> between March 16-27. I don't see other signs of trouble in /var/log. >>> >>> I could use some help in looking for evidence of foul play, or "just" a >>> hardware or software problem. >>> >>> Thanks in advance for further troubleshooting clues. >>> >>> dn >>> >> What kind of a machine is it running on? I remember having reboot >> problems on certain HP and Supermicro servers with hardware watchdogs. > > This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of > a pair running CARP. Aside from having to replace spinning disks with > SSDs a couple of years ago, they've been rock solid. > > I too have seen issues with Supermicros but that's with other OSs. I've > never had a spontaneous reboot, on this system, and am concerned from > the wtmp stuff above that this *may* have been triggered externally. I > could use some clues in other things to check. Thanks. > > dn > > The "reboot" wtmp entry is written by init(8). It is something that could possibly be caused by bad hardware or a glitch in the power feed amongst other options (the latter may affect some machines differently than others).. Perhaps it's worth enabling accounting in rc.conf.local to see if you can figure out if any commands are executed around that time if it happens again. |
On Sun, 28 Mar 2021, Stuart Henderson wrote: > It is something that could possibly be caused by bad hardware or a > glitch in the power feed amongst other options (the latter may affect > some machines differently than others).. I've had a string of power "blips" over the last year or so. Oddly enough, the OpenBSD machine always stays up and a Debian machine next to it on the same power strip reboots. I always figured it was due to the superior operating system ;) |
In reply to this post by David Newman-2
On 3/28/21 12:13 PM, David Newman wrote:
> On 3/28/21 4:58 AM, Kristjan Komloši wrote: > >> On 3/27/21 10:27 PM, David Newman wrote: >>> OpenBSD 6.8 GENERIC#5 i386 >>> >>> One of my systems rebooted at 03:01 local time today. I've seen kernel >>> panics and bad hardware but I've never seen OpenBSD "just reboot" by >>> itself, ever. OpenBSD, not usually. Hardware OpenBSD is running on? Sure. >>> There's no cron job that would do this. last(1) is no help; it shows the >>> reboot command but not the shutdown that preceded it: >>> >>> root@ns ~ 4# last -f /var/log/wtmp.0 >>> reboot   ~                                Sat Mar 27 03:01 >>> root     ttyp0   192.168.0.132           Wed Mar 24 11:23 - 11:23 >>> (00:00) >>> >>> wtmp.0 begins Wed Mar 24 11:23 2021 >>> root@ns ~ 5# last -f /var/log/wtmp.1 >>> root     ttyp0   192.168.0.132           Tue Mar 16 21:30 - 21:30 >>> (00:00) >>> root     ttyp0   75.82.86.131            Tue Mar 16 13:14 - 21:30 >>> (08:15) >>> root     ttyp0   75.82.86.131            Sun Mar 14 21:20 - 21:29 >>> (00:08) >>> root     ttyp0   75.82.86.131            Sat Mar 13 17:42 - 21:13 >>> (03:31) >>> >>> The date gaps seem odd. I've ssh'd into this system multiple times >>> between March 16-27. I don't see other signs of trouble in /var/log. >>> >>> I could use some help in looking for evidence of foul play, or "just" a >>> hardware or software problem. >>> >>> Thanks in advance for further troubleshooting clues. >>> >>> dn >>> >> What kind of a machine is it running on? I remember having reboot >> problems on certain HP and Supermicro servers with hardware watchdogs. > > This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of > a pair running CARP. Aside from having to replace spinning disks with > SSDs a couple of years ago, they've been rock solid. basic machine, worked for a long time, then starts giving problems, almost certainly a hw problem unless you can tie the problem to a recent upgrade. And that's not terribly likely on a "basic" hardware. Every broken device started out "rock solid" ... until it isn't. That's the definition of "Broken". > I too have seen issues with Supermicros but that's with other OSs. I've > never had a spontaneous reboot, on this system, and am concerned from > the wtmp stuff above that this *may* have been triggered externally. I > could use some clues in other things to check. Thanks. As Stuart pointed out, that comes from the boot process, not the shutdown. If you are really curious, you could put a serial console on it and wait for the next event. PROBABLY won't see much, however. Believe me, I'm all in favor of recycling computers -- in fact, as I often tell skeptical employers, I'd rather have two ten year old systems than one brand new system with a service contract, but computers don't last as long as they used to, and curiously, some big-name servers seem to sometimes have a shorter life than some desktops, A ten year old computer that does the job reliably is good, but not an expectation. Nick. |
On 3/29/21 5:28 AM, Nick Holland wrote: > On 3/28/21 12:13 PM, David Newman wrote: >> On 3/28/21 4:58 AM, Kristjan Komloši wrote: >> >>> On 3/27/21 10:27 PM, David Newman wrote: >>>> OpenBSD 6.8 GENERIC#5 i386 >>>> >>>> One of my systems rebooted at 03:01 local time today. I've seen kernel >>>> panics and bad hardware but I've never seen OpenBSD "just reboot" by >>>> itself, ever. > > OpenBSD, not usually. Hardware OpenBSD is running on? Sure. > >>>> There's no cron job that would do this. last(1) is no help; it shows >>>> the >>>> reboot command but not the shutdown that preceded it: >>>> >>>> root@ns ~ 4# last -f /var/log/wtmp.0 >>>> reboot   >>>> ~                                >>>> Sat Mar 27 03:01 >>>> root     ttyp0   192.168.0.132           Wed >>>> Mar 24 11:23 - 11:23 >>>> (00:00) >>>> >>>> wtmp.0 begins Wed Mar 24 11:23 2021 >>>> root@ns ~ 5# last -f /var/log/wtmp.1 >>>> root     ttyp0   192.168.0.132           Tue >>>> Mar 16 21:30 - 21:30 >>>> (00:00) >>>> root     ttyp0   75.82.86.131            Tue >>>> Mar 16 13:14 - 21:30 >>>> (08:15) >>>> root     ttyp0   75.82.86.131            Sun >>>> Mar 14 21:20 - 21:29 >>>> (00:08) >>>> root     ttyp0   75.82.86.131            Sat >>>> Mar 13 17:42 - 21:13 >>>> (03:31) >>>> >>>> The date gaps seem odd. I've ssh'd into this system multiple times >>>> between March 16-27. I don't see other signs of trouble in /var/log. >>>> >>>> I could use some help in looking for evidence of foul play, or "just" a >>>> hardware or software problem. >>>> >>>> Thanks in advance for further troubleshooting clues. >>>> >>>> dn >>>> >>> What kind of a machine is it running on? I remember having reboot >>> problems on certain HP and Supermicro servers with hardware watchdogs. >> >> This is a 10+-year-old Dell 1U server with a 2-GHz Celeron 440, part of >> a pair running CARP. Aside from having to replace spinning disks with >> SSDs a couple of years ago, they've been rock solid. > > basic machine, worked for a long time, then starts giving problems, almost > certainly a hw problem unless you can tie the problem to a recent upgrade. > And that's not terribly likely on a "basic" hardware. > > Every broken device started out "rock solid" ... until it isn't. That's > the definition of "Broken". > >> I too have seen issues with Supermicros but that's with other OSs. I've >> never had a spontaneous reboot, on this system, and am concerned from >> the wtmp stuff above that this *may* have been triggered externally. I >> could use some clues in other things to check. Thanks. > > As Stuart pointed out, that comes from the boot process, not the shutdown. > > If you are really curious, you could put a serial console on it and wait > for the next event. PROBABLY won't see much, however. > > Believe me, I'm all in favor of recycling computers -- in fact, as I > often tell skeptical employers, I'd rather have two ten year old systems > than one brand new system with a service contract, but computers don't > last as long as they used to, and curiously, some big-name servers seem > to sometimes have a shorter life than some desktops, A ten year old > computer that does the job reliably is good, but not an expectation. I hope it is "just" a hardware problem. These ancient machines don't owe me anything. If anything they've been a testament to how well OpenBSD just works, year in, year out. Until I can swap in a replacement (the unit in question is in a colo in another state), I may try Stuart's suggestion of enabling accounting. The only concern I have about an external actor is that there seem to be some missing entries in wtmp, but I don't know enough about init or wtmp to rule out a hardware glitch. Someone else suggested a battery problem, which seems plausible for a unit this old. Appreciate all the feedback -- many thanks. dn |
In reply to this post by Stuart Henderson
On Sun, Mar 28, 2021 at 08:05:58PM -0000, Stuart Henderson wrote:
[...] > It is something that could possibly be caused by bad hardware or a > glitch in the power feed amongst other options (the latter may affect > some machines differently than others).. Power glitch, bad power supply, bad RAM, ... Do you have a UPS? If so I bet it's a hardware problem. |
In reply to this post by David Newman-2
>One of my systems rebooted at 03:01 local time today.
Do you happen to have a cat nearby? |
On 4/1/21 2:51 PM, Rafael Possamai wrote:
>> One of my systems rebooted at 03:01 local time today. > > Do you happen to have a cat nearby? :-) I'm allergic, and this box is in a colo. Appreciate all the feedback. I've enabled accounting per Stuart's suggestion and am pretty sure this is a hiccup on old hardware. dn |
Free forum by Nabble | Edit this page |