Filesystem redundancy

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Filesystem redundancy

Julian Smith
I've been wondering about how to cope with random hardware failures when
data is being received from a WAN and written to local storage. As I
understand it, CARP(4) will enable any one of N machines to handle
incoming requests, so hardware failure of up to N-1 machines will be
handled.

But if each of these machines writes received data (e.g. emails) to a
shared hard drive, then we are back to a single point of failure (if
each machine writes to its own individual hard drive(s) then we end up
with no sharing of data). We can make the drive use RAID, but RAID
controllers can also fail.

One way of handling this would be to write a filesystem that copies the
contents of modified files over a network before close() returns. That
way, as long as a SMTP server (say) checks the return from close()
before telling the sender that it has received everything ok, we can
avoid any single point of failure. If the data is copied to all the
other machines in a CARP `family', then we should end up with perfectly
syncronised machines, each of which can take over at any time. The
obvious downside is potential speed problems.

This has the nice property that the unit of replacement is individual
machines, with no need for complicated and expensive hardware like
Network Addressed Storage/RAID. If something fails, install a fresh
machine, sync its hard drives a few times with one of the other machines
(whose contents will be changing due to incoming data from the WAN),
temporarily turn off the WAN, sync a final time, and restore the WAN.

I've written a simple test library dupfs that does this by intercepting
open() and close() with LD_PRELOAD, using system( "rsync ...") to do the
synronisation, and it works in trivial test cases. Any simple-minded
file-locking by dupfs would lead to deadlock I think, so something else
(CARP?) would have to ensure that only one of a number of machines was
active at any time.

I expected there to be standard solutions to this sort of problem, but I
was unable to find anything which didn't involve expensive hardware.
ISPs seem to accept that they will suffer downtime due to hardware
failure, and occasionally lose emails.

So, am I barking up the wrong tree here? What am I missing?

- Julian

--
http://www.op59.net/

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Per-Erik Persson
AFS would handle your storage in a redundant and distributed way where
you "easily" can add and remove a machine.
But this is not a thing you set up in an afternoon :-)
People seems to be afraid of it since it's complexity.
But when the work is done you wonder why people pay huge amounts for NAS
and similar things that sometimes doesn't work nearly as good as the
glossy brochure promised.
It scales good but the performance I don't know about.

A while ago there where some discussions on the list about openafs, has
someone written a complete or at least half done installation guide yet?

Julian Smith wrote:

>I've been wondering about how to cope with random hardware failures when
>data is being received from a WAN and written to local storage. As I
>understand it, CARP(4) will enable any one of N machines to handle
>incoming requests, so hardware failure of up to N-1 machines will be
>handled.
>
>But if each of these machines writes received data (e.g. emails) to a
>shared hard drive, then we are back to a single point of failure (if
>each machine writes to its own individual hard drive(s) then we end up
>with no sharing of data). We can make the drive use RAID, but RAID
>controllers can also fail.
>
>One way of handling this would be to write a filesystem that copies the
>contents of modified files over a network before close() returns. That
>way, as long as a SMTP server (say) checks the return from close()
>before telling the sender that it has received everything ok, we can
>avoid any single point of failure. If the data is copied to all the
>other machines in a CARP `family', then we should end up with perfectly
>syncronised machines, each of which can take over at any time. The
>obvious downside is potential speed problems.
>
>This has the nice property that the unit of replacement is individual
>machines, with no need for complicated and expensive hardware like
>Network Addressed Storage/RAID. If something fails, install a fresh
>machine, sync its hard drives a few times with one of the other machines
>(whose contents will be changing due to incoming data from the WAN),
>temporarily turn off the WAN, sync a final time, and restore the WAN.
>
>I've written a simple test library dupfs that does this by intercepting
>open() and close() with LD_PRELOAD, using system( "rsync ...") to do the
>synronisation, and it works in trivial test cases. Any simple-minded
>file-locking by dupfs would lead to deadlock I think, so something else
>(CARP?) would have to ensure that only one of a number of machines was
>active at any time.
>
>I expected there to be standard solutions to this sort of problem, but I
>was unable to find anything which didn't involve expensive hardware.
>ISPs seem to accept that they will suffer downtime due to hardware
>failure, and occasionally lose emails.
>
>So, am I barking up the wrong tree here? What am I missing?
>
>- Julian

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

knitti
In reply to this post by Julian Smith
There are SCSI enclosures with the ability to connect to two different
SCSI buses, so they can be accessed from two different machines.
 I _think_ the SCSI architecture could allow more than one host
adapter on a bus. _But_ I never heard someone did this. I presume it
would also depend on the host adapter and the driver.


--knitti

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Martin Schröder
In reply to this post by Julian Smith
On 2005-11-16 11:08:51 +0000, Julian Smith wrote:
> One way of handling this would be to write a filesystem that copies the
> contents of modified files over a network before close() returns. That
> way, as long as a SMTP server (say) checks the return from close()
> before telling the sender that it has received everything ok, we can
> avoid any single point of failure. If the data is copied to all the
> other machines in a CARP `family', then we should end up with perfectly
> syncronised machines, each of which can take over at any time. The
> obvious downside is potential speed problems.

Check out DRBD. Remote, shared RAID1. Sadly, it's Linux only.

Best
    Martin
--
                    http://www.tm.oneiros.de

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Marco Peereboom
In reply to this post by knitti
This is actually pretty common believe it or not.  This does not  
provide filesystem redundancy though.  What this provides is a  
mechanism to have multiple servers to touch the same disks.  There  
clearly is some danger here since you can't have multiple machines  
touching the same filesystem.  So what people tend to do is have some  
sort of monitoring application check if the other machine is still  
up; when it dies it simply takes over the filesystem from the failed  
machine.

There is even an opensource product called "Fail Safe" that provides  
the monitoring app functionality.  Last time I used it, it wasn't  
very robust but it did have all the required knobs to make such a  
thing work.

/marco

On Nov 16, 2005, at 7:35 AM, knitti wrote:

> There are SCSI enclosures with the ability to connect to two different
> SCSI buses, so they can be accessed from two different machines.
>  I _think_ the SCSI architecture could allow more than one host
> adapter on a bus. _But_ I never heard someone did this. I presume it
> would also depend on the host adapter and the driver.
>
>
> --knitti

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Will H. Backman
In reply to this post by Julian Smith
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
Of

> Marco Peereboom
> Sent: Wednesday, November 16, 2005 11:41 AM
> To: knitti
> Cc: Julian Smith; [hidden email]
> Subject: Re: Filesystem redundancy
>
> This is actually pretty common believe it or not.  This does not
> provide filesystem redundancy though.  What this provides is a
> mechanism to have multiple servers to touch the same disks.  There
> clearly is some danger here since you can't have multiple machines
> touching the same filesystem.  So what people tend to do is have some
> sort of monitoring application check if the other machine is still
> up; when it dies it simply takes over the filesystem from the failed
> machine.
>
> There is even an opensource product called "Fail Safe" that provides
> the monitoring app functionality.  Last time I used it, it wasn't
> very robust but it did have all the required knobs to make such a
> thing work.
>
> /marco
>
> On Nov 16, 2005, at 7:35 AM, knitti wrote:
>
> > There are SCSI enclosures with the ability to connect to two
different
> > SCSI buses, so they can be accessed from two different machines.
> >  I _think_ the SCSI architecture could allow more than one host
> > adapter on a bus. _But_ I never heard someone did this. I presume it
> > would also depend on the host adapter and the driver.
> >
> >
> > --knitti

Maybe OpenBSD can merge with OpenVMS, which should be easy given that
four of the letters are already the same.  OpenVMS has some amazing
clustering capabilities.

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Tobias Weingartner-2
On Wednesday, November 16, "Will H. Backman" wrote:
>
> Maybe OpenBSD can merge with OpenVMS, which should be easy given that
> four of the letters are already the same.  OpenVMS has some amazing
> clustering capabilities.

It's actually 5 letters... and if *you* can't even get that
much right, how the *HELL* is such a merge ever going to get
properly done!?!  :)

--Toby.

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Joachim Schipper
In reply to this post by Per-Erik Persson
On Wed, Nov 16, 2005 at 02:01:01PM +0100, Per-Erik Persson wrote:

> AFS would handle your storage in a redundant and distributed way where
> you "easily" can add and remove a machine.
> But this is not a thing you set up in an afternoon :-)
> People seems to be afraid of it since it's complexity.
> But when the work is done you wonder why people pay huge amounts for NAS
> and similar things that sometimes doesn't work nearly as good as the
> glossy brochure promised.
> It scales good but the performance I don't know about.
>
> A while ago there where some discussions on the list about openafs, has
> someone written a complete or at least half done installation guide yet?

I am sorry, could you elaborate? I recall, from my last look at OpenAFS,
that there was no way to replicate a live, read-write filesystem in
real time.

It did offer distributed/redundant read-only filesystems, and it seemed
quite easy to add some servers - but I saw no distributed, redundant
read-write filesystems. Am I just stupid? Behind? (Admittedly, the
OpenAFS documentation on the site seems out of date...)

There should be a semi-automatic installation script in the archives,
no more than a week (and probably much less) after 3.8-release came out.

                Joachim

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Julian Smith
On Wed, 16 Nov 2005 22:54:02 +0100
Joachim Schipper <[hidden email]> wrote:

> On Wed, Nov 16, 2005 at 02:01:01PM +0100, Per-Erik Persson wrote:
> > AFS would handle your storage in a redundant and distributed way
> > where  you "easily" can add and remove a machine.
> > But this is not a thing you set up in an afternoon :-)
> > People seems to be afraid of it since it's complexity.
> > But when the work is done you wonder why people pay huge amounts for
> > NAS  and similar things that sometimes doesn't work nearly as good
> > as the  glossy brochure promised.
> > It scales good but the performance I don't know about.
> >
> > A while ago there where some discussions on the list about openafs,
> > has  someone written a complete or at least half done installation
> > guide yet?
>
> I am sorry, could you elaborate? I recall, from my last look at
> OpenAFS, that there was no way to replicate a live, read-write
> filesystem in real time.
>
> It did offer distributed/redundant read-only filesystems, and it
> seemed quite easy to add some servers - but I saw no distributed,
> redundant read-write filesystems. Am I just stupid? Behind?
> (Admittedly, the OpenAFS documentation on the site seems out of
> date...)

I'm not sure I fully understand exactly what AFS offers in this area,
but it comes with a lot of extra stuff that I don't need, and also seems
very complicated.

I think I'll persevere with dupfs/LD_PRELOAD, and see how well it works
in practice.

Thanks,

- Jules

--
http://www.op59.net/

Reply | Threaded
Open this post in threaded view
|

Re: Filesystem redundancy

Joachim Schipper
On Mon, Nov 28, 2005 at 12:50:13PM +0000, Julian Smith wrote:
<on filesystem redundancy>

> I'm not sure I fully understand exactly what AFS offers in this area,
> but it comes with a lot of extra stuff that I don't need, and also seems
> very complicated.
>
> I think I'll persevere with dupfs/LD_PRELOAD, and see how well it works
> in practice.
>
> Thanks,
>
> - Jules

Let us know if you manage to make it work, reliably or at least enough
so that it could be made into something workable.

                Joachim