sparc64+swapping+RF: dying processes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

sparc64+swapping+RF: dying processes

Christian Weisgerber
For several months now, I've noticed that sometimes processes on
my -current sparc64 (Blade 100) just die.  I've now realized that
this always happens when the machine is swapping.  named (I run a
caching name server there) is a prominent victim and the occasional
ssh session, probably because these processes are swapped out.

Is anybody else seeing this?

The machine is running and swapping off RAIDframe (a simple RAID 0
over two disks).  Apart from the addition of raid(4), the kernel
is GENERIC.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: sparc64+swapping+RF: dying processes

Zak-3
* Christian Weisgerber <[hidden email]> [2006-10-13 20:23:39 +0000]:
Hi,
|> For several months now, I've noticed that sometimes processes on
|> my -current sparc64 (Blade 100) just die.  I've now realized that
|> this always happens when the machine is swapping.  named (I run a
|> caching name server there) is a prominent victim and the occasional
|> ssh session, probably because these processes are swapped out.
|>
|> Is anybody else seeing this?
|>
|> The machine is running and swapping off RAIDframe (a simple RAID 0
|> over two disks).  Apart from the addition of raid(4), the kernel
|> is GENERIC.
I can confirm this does happen seemingly with a lot of processes running,
however to reproduce easily in my case run apache and some crappy php
app like wordpress giving it lots of hits, when the high increase/decrease
of swapping daemons usually like clamav-milter will start dying, and httpd
child processes kills itself off, my current 'workaround' is setting
whatever processes swaps the most httpd in my case setting the priority to 8
or so seems to lessen the other processes from dying, however this isnt
really new to me, its something i've encountered for the past 6 months or
so, i recently updated to the latest snapshot, the problem still occurs,
this is on a sparc64 netra without raid, 512mb of ram, the only diff in my
kernel is the nkmempages cranked to 32768 which helps stops some weird
panics i used to get. dmesg/info is available upon request, since its a
production box, i'd really hate to fiddle around with it too much at this
time, however adjusting various priority for daemons/intensive programs and
a fairly heavidly modified login.conf seems to lessen the amount of time a
process will crash/restart itself, it used to happen on a daily basis with
the default priority of 0 for everything almost every 4-6hrs when swapping
hits hard, with 'workaround' its now about every 3-4 days.

cheers

//zak
--
    "Only two things are infinite, the universe and human
     stupidity, and I'm not sure about the former."
                        - Albert Einstein

Reply | Threaded
Open this post in threaded view
|

Re: sparc64+swapping+RF: dying processes

Thomas Alexander Frederiksen-2
In reply to this post by Christian Weisgerber
Christian Weisgerber skrev:
> For several months now, I've noticed that sometimes processes on
> my -current sparc64 (Blade 100) just die.  I've now realized that
> this always happens when the machine is swapping.  named (I run a
> caching name server there) is a prominent victim and the occasional
> ssh session, probably because these processes are swapped out.
>
> Is anybody else seeing this?

Yes, on a server I've taken out of production for that very reason, but
haven't had time to do any testing on. It's a Netra T105 with 64MB of
RAM running 3.9 with a bunch of services on top, and thus quite a bit of
swapping going on.

> The machine is running and swapping off RAIDframe (a simple RAID 0
> over two disks).  Apart from the addition of raid(4), the kernel
> is GENERIC.

If I recall correctly I was running with swap off RAIDframe as well, but
I'm not entirely sure.

Apart from the random process deaths, it had the nasty habit of crashing
hard (as in no response from the LOM-system either) every 20 to 30 days.
This may or may not be related to the same issue, but is more likely to
be flaky hardware. When I get a chance to bring it back online I'll do
further testing, including an upgrade to 4.0.

--
Thomas A. Frederiksen

Reply | Threaded
Open this post in threaded view
|

Re: sparc64+swapping+RF: dying processes

Raymond Lillard-2
In reply to this post by Christian Weisgerber
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Christian Weisgerber wrote:

> For several months now, I've noticed that sometimes processes on
> my -current sparc64 (Blade 100) just die.  I've now realized that
> this always happens when the machine is swapping.  named (I run a
> caching name server there) is a prominent victim and the occasional
> ssh session, probably because these processes are swapped out.
>
> Is anybody else seeing this?
>
> The machine is running and swapping off RAIDframe (a simple RAID 0
> over two disks).  Apart from the addition of raid(4), the kernel
> is GENERIC.
>

Just by way of confirmation of the swapping thing, I run
Netra T1-105s with plenty of RAM at several small business
clients of mine and they are as solid as the Rock of Gibraltar.
They run DNS, Dovecot, Apache and pf.  The only troubles I have
on these machines is with Dovecot, however these are clearly
Dovecot issues (which is a whole other bucket of worms).

The kernels are stock 3.8 and 3.9 -stable with no RAIDFRAME.

I personally don't trust RF, but if someone can fix it,
I might change my opinion.

Regards all,
Ray
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFMAkQW6j0KQLSyTQRAuE1AJ9KhFwY8M2t1HOO0kMzgOupSptxWwCcC3ZL
EQi/8CMoNl+9pmo2CIA+dGw=
=F49W
-----END PGP SIGNATURE-----

Reply | Threaded
Open this post in threaded view
|

Re: sparc64+swapping+RF: dying processes

Sebastian Schmitzdorff
In reply to this post by Thomas Alexander Frederiksen-2
Hi,

Am Freitag, den 13.10.2006, 23:00 +0200 schrieb Thomas Alexander
Frederiksen:

> Christian Weisgerber skrev:
> > For several months now, I've noticed that sometimes processes on
> > my -current sparc64 (Blade 100) just die.  I've now realized that
> > this always happens when the machine is swapping.  named (I run a
> > caching name server there) is a prominent victim and the occasional
> > ssh session, probably because these processes are swapped out.
> >
> > Is anybody else seeing this?
>
> Yes, on a server I've taken out of production for that very reason, but
> haven't had time to do any testing on. It's a Netra T105 with 64MB of
> RAM running 3.9 with a bunch of services on top, and thus quite a bit of
> swapping going on.
>
> > The machine is running and swapping off RAIDframe (a simple RAID 0
> > over two disks).  Apart from the addition of raid(4), the kernel
> > is GENERIC.
>
> If I recall correctly I was running with swap off RAIDframe as well, but
> I'm not entirely sure.
>
> Apart from the random process deaths, it had the nasty habit of crashing
> hard (as in no response from the LOM-system either) every 20 to 30 days.
> This may or may not be related to the same issue, but is more likely to
> be flaky hardware. When I get a chance to bring it back online I'll do
> further testing, including an upgrade to 4.0.
>
I run OpenBSD 3.9 on a Netra X1 with 1GB RAM, Raidframe as well (Raid 1)
and it crashes every 20 to 30 days too. There were no random process
dyings at all though. I'm already thinking of replaycing the system
board since I still have a spare one. But reading your mail makes me
reconsider. I'm not sure yet how to get to the ground of this problem
but I will attach a serial cable the next days and hopefully get
something useful out of it the next time(crash).

sebastian