Behaviour of fsync() in case of write-back errors

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Behaviour of fsync() in case of write-back errors

Thomas Munro
Hello,

I work on PostgreSQL.  I don't use OpenBSD, but recently I've been
investigating how fsync() reports write-back errors on all operating
systems that people like to run PostgreSQL on:

https://wiki.postgresql.org/wiki/Fsync_Errors

It seems to me that on OpenBSD, asynchronous write-back errors might
not be reported to userspace in a subsequent call to fsync(), and
synchronous write-back errors that are reported to userspace might not
be reported in a follow-up call to fsync() (that is, retrying will
appear to be successful but in fact your data is gone).

Am I wrong?  Perhaps some other code elsewhere will record the error
at device, filesystem or inode level?  I didn't try to test this: I
simply compared the brelse() error handling code with that of FreeBSD
and NetBSD whose behaviours are known to be correct and incorrect
respectively, according to our assessment of what fsync() *should* do.
(Or at least the behaviour that PostgreSQL relies on, when it reports
that your data exists on disk as part of its checkpoint protocol).
OpenBSD certainly appears to be like NetBSD in this respect, so I
thought it was worth pinging your list and asking for an expert
opinion.

Do you think that my suspicion is correct?  Would you consider that to
be a bug?

Thanks!

Thomas Munro

Reply | Threaded
Open this post in threaded view
|

Re: Behaviour of fsync() in case of write-back errors

Otto Moerbeek
On Fri, Apr 13, 2018 at 11:26:52AM +1200, Thomas Munro wrote:

> Hello,
>
> I work on PostgreSQL.  I don't use OpenBSD, but recently I've been
> investigating how fsync() reports write-back errors on all operating
> systems that people like to run PostgreSQL on:
>
> https://wiki.postgresql.org/wiki/Fsync_Errors
>
> It seems to me that on OpenBSD, asynchronous write-back errors might
> not be reported to userspace in a subsequent call to fsync(), and
> synchronous write-back errors that are reported to userspace might not
> be reported in a follow-up call to fsync() (that is, retrying will
> appear to be successful but in fact your data is gone).
>
> Am I wrong?  Perhaps some other code elsewhere will record the error
> at device, filesystem or inode level?  I didn't try to test this: I
> simply compared the brelse() error handling code with that of FreeBSD
> and NetBSD whose behaviours are known to be correct and incorrect
> respectively, according to our assessment of what fsync() *should* do.
> (Or at least the behaviour that PostgreSQL relies on, when it reports
> that your data exists on disk as part of its checkpoint protocol).
> OpenBSD certainly appears to be like NetBSD in this respect, so I
> thought it was worth pinging your list and asking for an expert
> opinion.
>
> Do you think that my suspicion is correct?  Would you consider that to
> be a bug?
>
> Thanks!
>
> Thomas Munro

Hi,

I noticed this report, but I did not have time to look into this yet.
Either I will be looking into this the coming time, or I'll try to
find another developer.

        -Otto