postgresql-server exiting abnormally after upgrade to -snapshot

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
Hi,

I upgraded to -snapshot today, and did all the proper postgresql upgrade:
pg_dump, moved the old db out the the way, re-init'd, started, and import.

The thing is, upon receiving connections, postgres dies horribly. The log is
just this following iterating over and over:

  WARNING:  terminating connection because of crash of another server process
  DETAIL:  The postmaster has commanded this server process to roll back the
  current transaction and exit, because another server process exited
abnormally
  and possibly corrupted shared memory.
  HINT:  In a moment you should be able to reconnect to the database and
repeat
  your command.
  LOG:  all server processes terminated; reinitializing
  LOG:  database system was interrupted; last known up at 2015-02-11 17:01:00
GMT
  LOG:  database system was not properly shut down; automatic recovery in
  progress
  LOG:  record with zero length at 0/1696370
  LOG:  redo is not required
  LOG:  database system is ready to accept connections
  LOG:  autovacuum launcher started
  LOG:  server process (PID 9444) was terminated by signal 6: Abort trap
  LOG:  terminating any other active server processes

After much frustration (even building -current), I deleted all of it,
uninstall, built 9.3.4 using the old ports recipe, installed - same issue!

It's clearly not an upgrade issue, because deleting all the data files and
going back to 9.3 has the same issue.

Has anyone else has this issue, or similar issues with -snapshot/-current?
Can
someone else confirm postgres9.4 work fine on the latest -snapshot? (the
confirmation would be helpful to reafirm that it's not an issue with some
dependency or library).

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
On 2015-02-11 19:54, Jan Stary wrote:
> On Feb 11 14:49:17, [hidden email] wrote:
> > Hi,
> >
> > I upgraded to -snapshot today, and did all the proper postgresql upgrade:
> > pg_dump, moved the old db out the the way, re-init'd, started, and
import.
> >
> > The thing is, upon receiving connections, postgres dies horribly. The log
is
> > just this following iterating over and over:
> >
> >   WARNING:  terminating connection because of crash of another server
process
> >   DETAIL:  The postmaster has commanded this server process to roll back
the
> >   current transaction and exit, because another server process exited
> > abnormally
> >   and possibly corrupted shared memory.
> >   HINT:  In a moment you should be able to reconnect to the database and
> > repeat
> >   your command.
> >   LOG:  all server processes terminated; reinitializing
> >   LOG:  database system was interrupted; last known up at 2015-02-11
17:01:00

> > GMT
> >   LOG:  database system was not properly shut down; automatic recovery in
> >   progress
> >   LOG:  record with zero length at 0/1696370
> >   LOG:  redo is not required
> >   LOG:  database system is ready to accept connections
> >   LOG:  autovacuum launcher started
> >   LOG:  server process (PID 9444) was terminated by signal 6: Abort trap
> >   LOG:  terminating any other active server processes
> >
> > After much frustration (even building -current), I deleted all of it,
> > uninstall, built 9.3.4 using the old ports recipe, installed - same
issue!
> >
> > It's clearly not an upgrade issue, because deleting all the data files
and
> > going back to 9.3 has the same issue.
>
> Have you stopped the DB server before performing the upgrade?
> Are you sure (pgrep -fl post) that there is no other server process
> around?
>
> Jan
>

Yes, I did. I also did this when installing the version I built from ports
(which I also tried with no change).

I actually did the entire process a few times, with -snapshots, -current and
installing from packages.

All exhibited the same behaviour, so I'm starting to suspect the issue is not
postgres per se.

> > Has anyone else has this issue, or similar issues with
-snapshot/-current?

> > Can
> > someone else confirm postgres9.4 work fine on the latest -snapshot? (the
> > confirmation would be helpful to reafirm that it's not an issue with some
> > dependency or library).
> >
> > Thanks,
> >
> > --
> > Hugo Osvaldo Barrera
> > A: Because we read from top to bottom, left to right.
> > Q: Why should I start my reply below the quoted text?
> >
> > [demime 1.01d removed an attachment of type application/pgp-signature]

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Stuart Henderson
In reply to this post by Hugo Osvaldo Barrera-2
On 2015-02-11, Hugo Osvaldo Barrera <[hidden email]> wrote:
> Can
> someone else confirm postgres9.4 work fine on the latest -snapshot? (the
> confirmation would be helpful to reafirm that it's not an issue with some
> dependency or library).

Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64.

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
On 2015-02-12 10:18, Stuart Henderson wrote:
> On 2015-02-11, Hugo Osvaldo Barrera <[hidden email]> wrote:
> > Can
> > someone else confirm postgres9.4 work fine on the latest -snapshot? (the
> > confirmation would be helpful to reafirm that it's not an issue with some
> > dependency or library).
>
> Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64.
>

Ok, so now I know that the issue is on my end. Which leaves me even more
confused. You're running the latest snapshots too, right? (eg: the ones from
feb 10th?).

Aside from a clean install, do you have any more changes? Perhaps login.conf?

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Stuart Henderson
On 2015-02-12, Hugo Osvaldo Barrera <[hidden email]> wrote:

> On 2015-02-12 10:18, Stuart Henderson wrote:
>> On 2015-02-11, Hugo Osvaldo Barrera <[hidden email]> wrote:
>> > Can
>> > someone else confirm postgres9.4 work fine on the latest -snapshot? (the
>> > confirmation would be helpful to reafirm that it's not an issue with some
>> > dependency or library).
>>
>> Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64.
>>
>
> Ok, so now I know that the issue is on my end. Which leaves me even more
> confused. You're running the latest snapshots too, right? (eg: the ones from
> feb 10th?).
>
> Aside from a clean install, do you have any more changes? Perhaps login.conf?

I have the login.conf section from the example in the pkg-readme,

postgresql:\
        :openfiles-cur=768:\
        :tc=daemon:

and this in sysctl.conf

# postgresql
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.shminfo.shmmax=50331648

<sthen@hutch:~:532>$ ls -l /bin/ls /usr/local/bin/postgres
-r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
-r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*

<sthen@hutch:~:533>$ sysctl kern.version
kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
On 2015-02-13 13:20, Stuart Henderson wrote:
> On 2015-02-12, Hugo Osvaldo Barrera <[hidden email]> wrote:
> > On 2015-02-12 10:18, Stuart Henderson wrote:
> >> On 2015-02-11, Hugo Osvaldo Barrera <[hidden email]> wrote:
> >> > Can
> >> > someone else confirm postgres9.4 work fine on the latest -snapshot?
(the
> >> > confirmation would be helpful to reafirm that it's not an issue with
some
> >> > dependency or library).
> >>
> >> Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on
amd64.
> >>
> >
> > Ok, so now I know that the issue is on my end. Which leaves me even more
> > confused. You're running the latest snapshots too, right? (eg: the ones
from
> > feb 10th?).
> >
> > Aside from a clean install, do you have any more changes? Perhaps
login.conf?

>
> I have the login.conf section from the example in the pkg-readme,
>
> postgresql:\
>         :openfiles-cur=768:\
>         :tc=daemon:
>
> and this in sysctl.conf
>
> # postgresql
> kern.seminfo.semmni=256
> kern.seminfo.semmns=2048
> kern.shminfo.shmmax=50331648
>
> <sthen@hutch:~:532>$ ls -l /bin/ls /usr/local/bin/postgres
> -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
> -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*
>
> <sthen@hutch:~:533>$ sysctl kern.version
> kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015
>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
>

Thanks for all the details. It looks like almost everything is identical
except our kernels (I had a few extra fields in sysctl.conf edited for pg,
but
reverted them just to make sure they weren't screwing up).

  # sysctl kern.version
  kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST
2015
      [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

I switched to the SP kernel just to discard any possible regressions that
might
be affecting this scenario, but no change.

It looks like the issue is elsewhere, but I've no idea where to look. I've so
far failed to build postgresql-server with debug symbols enabled too, but
that's just lack of knowledge on my part.

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Abel Abraham Camarillo Ojeda-2
On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera <[hidden email]> wrote:

> On 2015-02-13 13:20, Stuart Henderson wrote:
>> On 2015-02-12, Hugo Osvaldo Barrera <[hidden email]> wrote:
>> > On 2015-02-12 10:18, Stuart Henderson wrote:
>> >> On 2015-02-11, Hugo Osvaldo Barrera <[hidden email]> wrote:
>> >> > Can
>> >> > someone else confirm postgres9.4 work fine on the latest -snapshot?
> (the
>> >> > confirmation would be helpful to reafirm that it's not an issue with
> some
>> >> > dependency or library).
>> >>
>> >> Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on
> amd64.
>> >>
>> >
>> > Ok, so now I know that the issue is on my end. Which leaves me even more
>> > confused. You're running the latest snapshots too, right? (eg: the ones
> from
>> > feb 10th?).
>> >
>> > Aside from a clean install, do you have any more changes? Perhaps
> login.conf?
>>
>> I have the login.conf section from the example in the pkg-readme,
>>
>> postgresql:\
>>         :openfiles-cur=768:\
>>         :tc=daemon:
>>
>> and this in sysctl.conf
>>
>> # postgresql
>> kern.seminfo.semmni=256
>> kern.seminfo.semmns=2048
>> kern.shminfo.shmmax=50331648
>>
>> <sthen@hutch:~:532>$ ls -l /bin/ls /usr/local/bin/postgres
>> -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
>> -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*
>>
>> <sthen@hutch:~:533>$ sysctl kern.version
>> kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015
>>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
>>
>
> Thanks for all the details. It looks like almost everything is identical
> except our kernels (I had a few extra fields in sysctl.conf edited for pg,
> but
> reverted them just to make sure they weren't screwing up).
>
>   # sysctl kern.version
>   kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST
> 2015
>       [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> I switched to the SP kernel just to discard any possible regressions that
> might
> be affecting this scenario, but no change.
>
> It looks like the issue is elsewhere, but I've no idea where to look. I've so
> far failed to build postgresql-server with debug symbols enabled too, but
> that's just lack of knowledge on my part.
>
> --
> Hugo Osvaldo Barrera
> A: Because we read from top to bottom, left to right.
> Q: Why should I start my reply below the quoted text?
>
> [demime 1.01d removed an attachment of type application/pgp-signature]
>


you should give more information about how to reproduce this problem,
how accurately can you reproduce it, are you sending just a given query
and it always crashes?

you should get more error context, maybe try log_statement into postgresql.conf
and try to log all statements and see which one crashes it...

http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html

are you using any custom C extension?

did you dump and restore database ? did you use 'custom format' or
'plain format' ?
there where any errors on import? - postgres just warns about some
import errors,
which in my opinion are severe...

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
On 2015-02-14 02:28, Abel Abraham Camarillo Ojeda wrote:
> On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera <[hidden email]>
wrote:
> > On 2015-02-13 13:20, Stuart Henderson wrote:
> >> On 2015-02-12, Hugo Osvaldo Barrera <[hidden email]> wrote:
> >> > On 2015-02-12 10:18, Stuart Henderson wrote:
> >> >> On 2015-02-11, Hugo Osvaldo Barrera <[hidden email]> wrote:
> >> >> > Can
> >> >> > someone else confirm postgres9.4 work fine on the latest -snapshot?
> > (the
> >> >> > confirmation would be helpful to reafirm that it's not an issue
with
> > some
> >> >> > dependency or library).
> >> >>
> >> >> Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on
> > amd64.
> >> >>
> >> >
> >> > Ok, so now I know that the issue is on my end. Which leaves me even
more
> >> > confused. You're running the latest snapshots too, right? (eg: the
ones

> > from
> >> > feb 10th?).
> >> >
> >> > Aside from a clean install, do you have any more changes? Perhaps
> > login.conf?
> >>
> >> I have the login.conf section from the example in the pkg-readme,
> >>
> >> postgresql:\
> >>         :openfiles-cur=768:\
> >>         :tc=daemon:
> >>
> >> and this in sysctl.conf
> >>
> >> # postgresql
> >> kern.seminfo.semmni=256
> >> kern.seminfo.semmns=2048
> >> kern.shminfo.shmmax=50331648
> >>
> >> <sthen@hutch:~:532>$ ls -l /bin/ls /usr/local/bin/postgres
> >> -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
> >> -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*
> >>
> >> <sthen@hutch:~:533>$ sysctl kern.version
> >> kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST
2015
> >>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
> >>
> >
> > Thanks for all the details. It looks like almost everything is identical
> > except our kernels (I had a few extra fields in sysctl.conf edited for
pg,
> > but
> > reverted them just to make sure they weren't screwing up).
> >
> >   # sysctl kern.version
> >   kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16
MST
> > 2015
> >       [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > I switched to the SP kernel just to discard any possible regressions that
> > might
> > be affecting this scenario, but no change.
> >
> > It looks like the issue is elsewhere, but I've no idea where to look. I've
so

> > far failed to build postgresql-server with debug symbols enabled too, but
> > that's just lack of knowledge on my part.
> >
> > --
> > Hugo Osvaldo Barrera
> > A: Because we read from top to bottom, left to right.
> > Q: Why should I start my reply below the quoted text?
> >
> > [demime 1.01d removed an attachment of type application/pgp-signature]
> >
>
>
> you should give more information about how to reproduce this problem,
> how accurately can you reproduce it, are you sending just a given query
> and it always crashes?
>

It always crashes extremely frequently. I haven't noticed a pattern, and the
server never lives more than a few senconds. No particular query seems to
trigger it, and adding log_statement showed that it may even crash *before*
any
queries are executed (see below as well).

> you should get more error context, maybe try log_statement into
postgresql.conf
> and try to log all statements and see which one crashes it...
>
> http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html
>
> are you using any custom C extension?
>

Nope, this is a plain default install from snapshots with nothing extra.

> did you dump and restore database ? did you use 'custom format' or
> 'plain format' ?

My latest tests reproduce the same issue on a clean "out-of-the-box" db (eg:
not importing any data).

> there where any errors on import? - postgres just warns about some
> import errors,
> which in my opinion are severe...

This is a log with log_statement and a most logging turned on. I'd only run
the
server *once* post-initialization before this. The database was completely
empty:

http://sprunge.us/UVGj

While a query managed to get through once, the server usually crashed before
that happens.

Here's another, finer-grained log, with nothing useful (apperently) either:

http://sprunge.us/FQaJ

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Joel Sing-3
On Saturday 14 February 2015, Hugo Osvaldo Barrera wrote:

> On 2015-02-14 02:28, Abel Abraham Camarillo Ojeda wrote:
> > On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera <[hidden email]>
>
> wrote:
> > > On 2015-02-13 13:20, Stuart Henderson wrote:
> > >> On 2015-02-12, Hugo Osvaldo Barrera <[hidden email]> wrote:
> > >> > On 2015-02-12 10:18, Stuart Henderson wrote:
> > >> >> On 2015-02-11, Hugo Osvaldo Barrera <[hidden email]> wrote:
> > >> >> > Can
> > >> >> > someone else confirm postgres9.4 work fine on the latest
> > >> >> > -snapshot?
> > >
> > > (the
> > >
> > >> >> > confirmation would be helpful to reafirm that it's not an issue
>
> with
>
> > > some
> > >
> > >> >> > dependency or library).
> > >> >>
> > >> >> Works fine on my bacula box, running 9.4.1 (and previously 9.4.0)
> > >> >> on
> > >
> > > amd64.
> > >
> > >> > Ok, so now I know that the issue is on my end. Which leaves me even
>
> more
>
> > >> > confused. You're running the latest snapshots too, right? (eg: the
>
> ones
>
> > > from
> > >
> > >> > feb 10th?).
> > >> >
> > >> > Aside from a clean install, do you have any more changes? Perhaps
> > >
> > > login.conf?
> > >
> > >> I have the login.conf section from the example in the pkg-readme,
> > >>
> > >> postgresql:\
> > >>
> > >>         :openfiles-cur=768:\
> > >>         :tc=daemon:
> > >>
> > >> and this in sysctl.conf
> > >>
> > >> # postgresql
> > >> kern.seminfo.semmni=256
> > >> kern.seminfo.semmns=2048
> > >> kern.shminfo.shmmax=50331648
> > >>
> > >> <sthen@hutch:~:532>$ ls -l /bin/ls /usr/local/bin/postgres
> > >> -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
> > >> -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*
> > >>
> > >> <sthen@hutch:~:533>$ sysctl kern.version
> > >> kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST
>
> 2015
>
> > >>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
> > >
> > > Thanks for all the details. It looks like almost everything is
> > > identical except our kernels (I had a few extra fields in sysctl.conf
> > > edited for
>
> pg,
>
> > > but
> > > reverted them just to make sure they weren't screwing up).
> > >
> > >   # sysctl kern.version
> > >   kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16
>
> MST
>
> > > 2015
> > >       [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > > I switched to the SP kernel just to discard any possible regressions
> > > that might
> > > be affecting this scenario, but no change.
> > >
> > > It looks like the issue is elsewhere, but I've no idea where to look.
> > > I've
>
> so
>
> > > far failed to build postgresql-server with debug symbols enabled too,
> > > but that's just lack of knowledge on my part.
> > >
> > > --
> > > Hugo Osvaldo Barrera
> > > A: Because we read from top to bottom, left to right.
> > > Q: Why should I start my reply below the quoted text?
> > >
> > > [demime 1.01d removed an attachment of type application/pgp-signature]
> >
> > you should give more information about how to reproduce this problem,
> > how accurately can you reproduce it, are you sending just a given query
> > and it always crashes?
>
> It always crashes extremely frequently. I haven't noticed a pattern, and
> the server never lives more than a few senconds. No particular query seems
> to trigger it, and adding log_statement showed that it may even crash
> *before* any
> queries are executed (see below as well).
>
> > you should get more error context, maybe try log_statement into
>
> postgresql.conf
>
> > and try to log all statements and see which one crashes it...
> >
> > http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html
> >
> > are you using any custom C extension?
>
> Nope, this is a plain default install from snapshots with nothing extra.
>
> > did you dump and restore database ? did you use 'custom format' or
> > 'plain format' ?
>
> My latest tests reproduce the same issue on a clean "out-of-the-box" db
> (eg: not importing any data).
>
> > there where any errors on import? - postgres just warns about some
> > import errors,
> > which in my opinion are severe...
>
> This is a log with log_statement and a most logging turned on. I'd only run
> the
> server *once* post-initialization before this. The database was completely
> empty:
>
> http://sprunge.us/UVGj
>
> While a query managed to get through once, the server usually crashed
> before that happens.

The interesting/useful part is:

LOG:  statement: SELECT ... ORDER BY c.oid
LOG:  server process (PID 11531) was terminated by signal 6: Abort trap

So the server process is being sent a SIGABRT, which is causing it to
terminate. There is a good chance this this is coming from the stack
protector, which sends a SIGABRT if the stack is smashed.

Is there anything in dmesg or syslog that correlates?

Failing that your next step is likely to run it under gdb and get a backtrace
from the point where the SIGABRT occurs. You can also bisect by rolling back
to an older snapshot to see if you can locate the change that has triggered
the issue.

> Here's another, finer-grained log, with nothing useful (apperently) either:
>
> http://sprunge.us/FQaJ
>
> Thanks,
>
> --
> Hugo Osvaldo Barrera
> A: Because we read from top to bottom, left to right.
> Q: Why should I start my reply below the quoted text?
>
> [demime 1.01d removed an attachment of type application/pgp-signature]
--

    "Action without study is fatal. Study without action is futile."
        -- Mary Ritter Beard

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Stuart Henderson
On 2015-02-14, Joel Sing <[hidden email]> wrote:
> The interesting/useful part is:
>
> LOG:  statement: SELECT ... ORDER BY c.oid
> LOG:  server process (PID 11531) was terminated by signal 6: Abort trap
>
> So the server process is being sent a SIGABRT, which is causing it to
> terminate. There is a good chance this this is coming from the stack
> protector, which sends a SIGABRT if the stack is smashed.

Oh, good call. It could also be a backwards memcpy which would show
up in /var/log/messages (assuming usual config).

If it were another program, our strict mutex checks can also cause
SIGABRT, but that won't apply to pgsql as it's not threaded.

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
On 2015-02-14 13:29, Stuart Henderson wrote:

> On 2015-02-14, Joel Sing <[hidden email]> wrote:
> > The interesting/useful part is:
> >
> > LOG:  statement: SELECT ... ORDER BY c.oid
> > LOG:  server process (PID 11531) was terminated by signal 6: Abort trap
> >
> > So the server process is being sent a SIGABRT, which is causing it to
> > terminate. There is a good chance this this is coming from the stack
> > protector, which sends a SIGABRT if the stack is smashed.
>
> Oh, good call. It could also be a backwards memcpy which would show
> up in /var/log/messages (assuming usual config).
>

Yup, backward memcpy it is (from /var/log/messages):

    Feb 14 12:27:34 elysion postgres: backwards memcpy
    Feb 14 12:28:10 elysion last message repeated 8 times
    Feb 14 12:30:19 elysion last message repeated 28 times
    Feb 14 12:40:28 elysion last message repeated 128 times
    Feb 14 12:50:40 elysion last message repeated 128 times
    Feb 14 13:00:41 elysion last message repeated 126 times
    Feb 14 13:10:42 elysion last message repeated 128 times
    Feb 14 13:20:49 elysion last message repeated 126 times
    Feb 14 13:30:55 elysion last message repeated 128 times
    Feb 14 13:41:06 elysion last message repeated 132 times
    Feb 14 13:51:10 elysion last message repeated 128 times
    Feb 14 14:01:18 elysion last message repeated 128 times
    Feb 14 14:08:18 elysion last message repeated 91 times

Am I mistaken in understanding that this is an issue with postgresql itself,
and not a local configuration error?

I tried building postgres with debug symbols (I added the flags described
here[1] to the ports Makefile), but the backtrace is still useless:

    # sudo -u _postgresql gdb -q -c postgres.core /usr/local/bin/postgres
    Core was generated by `postgres'.
    Program terminated with signal 6, Aborted.
    Loaded symbols for /usr/local/bin/postgres
    #0  0x00000bd73424292a in ?? ()
    (gdb) bt
    #0  0x00000bd73424292a in ?? ()
    #1  0x0000000000000000 in ?? ()

Do I need any further OpenBSD-specific changes to get a useful backtrace?
(I've
to admit that I'm too familiar with debuging with gdb on any platform).

Thanks for all the feedback so far!

[1]:
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQ
L_backend_on_Linux/BSD#Debugging_the_core_dump_-_example

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Stuart Henderson
On 2015-02-15, Hugo Osvaldo Barrera <[hidden email]> wrote:
>
> Am I mistaken in understanding that this is an issue with postgresql itself,
> and not a local configuration error?

Correct.

> I tried building postgres with debug symbols (I added the flags described
> here[1] to the ports Makefile), but the backtrace is still useless:

Please would you rebuild from the original port like this:

make clean=all
make DEBUG="-O0 -g" repackage && sudo make reinstall

and see if this gives a better backtrace.

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
On 2015-02-16 16:24, Stuart Henderson wrote:
> On 2015-02-15, Hugo Osvaldo Barrera <[hidden email]> wrote:
> >
> > Am I mistaken in understanding that this is an issue with postgresql
itself,

> > and not a local configuration error?
>
> Correct.
>
> > I tried building postgres with debug symbols (I added the flags described
> > here[1] to the ports Makefile), but the backtrace is still useless:
>
> Please would you rebuild from the original port like this:
>
> make clean=all
> make DEBUG="-O0 -g" repackage && sudo make reinstall
>
> and see if this gives a better backtrace.
>

Thanks a lot, it did. I was unaware of make DEBUG, and had been editing the
Makefile with no success.

  (gdb) bt
  #0  0x0000110a2815b92a in kill () at <stdin>:2
  #1  0x0000110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
  #2  0x0000110a2816a238 in memcpy (dst0=0xfb8d4, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
  #3  0x000011080cf8d1b1 in check_ip (raddr=0x110a899f7918,
addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704
  #4  0x000011080cf90a04 in check_hba (port=0x110a899f7800) at hba.c:1718
  #5  0x000011080cf91d34 in hba_getauthmethod (port=0x110a899f7800) at
hba.c:2256
  #6  0x000011080cf88eb3 in ClientAuthentication (port=0x110a899f7800) at
auth.c:307
  #7  0x000011080d1edf5d in PerformAuthentication (port=0x110a899f7800) at
postinit.c:223
  #8  0x000011080d1eeae7 in InitPostgres (in_dbname=0x110af4508c00
"virtstart-dev", dboid=0,
      username=0x110af4508be0 "virtstart-dev", out_dbname=0x0) at
postinit.c:688
  #9  0x000011080d0a3eb1 in PostgresMain (argc=1, argv=0x110af4508c20,
dbname=0x110af4508c00 "virtstart-dev",
      username=0x110af4508be0 "virtstart-dev") at postgres.c:3749
  #10 0x000011080d033537 in BackendRun (port=Could not find the frame base for
"BackendRun".
  ) at postmaster.c:4155
  #11 0x000011080d032be8 in BackendStartup (port=0x110a899f7800) at
postmaster.c:3829
  #12 0x000011080d02f2d0 in ServerLoop () at postmaster.c:1597
  #13 0x000011080d02e968 in PostmasterMain (argc=3, argv=0x7f7ffffd9658) at
postmaster.c:1244
  #14 0x000011080cf96dc8 in main (argc=Could not find the frame base for
"main".
  ) at main.c:228
  Current language:  auto; currently asm

This doesn't say much to me though. I guess my best shot is to post this at
the
postgresql list, right?

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Stuart Henderson
On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote:
>   (gdb) bt

Was this backtrace from a new coredump, or was it from one created by
the old binary? (if the latter, please could you remove the old coredump
and get it to crash again and send a fresh backtrace?)

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
On 2015-02-16 21:02, Stuart Henderson wrote:
> On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote:
> >   (gdb) bt
>
> Was this backtrace from a new coredump, or was it from one created by
> the old binary? (if the latter, please could you remove the old coredump
> and get it to crash again and send a fresh backtrace?)
>

My pg_hba is the stock one (since it had also been deleted):
http://sprunge.us/ZdQI

It was a brand-new core dump, since I had deleted /var/postgresql right
before
generating it. I regenerated it just to be sure, and it's the same:

  (gdb) bt
  #0  0x0000110a2815b92a in kill () at <stdin>:2
  #1  0x0000110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
  #2  0x0000110a2816a238 in memcpy (dst0=0xf81bf, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
  #3  0x000011080cf8d1b1 in check_ip (raddr=0x110abc279918,
addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704
  #4  0x000011080cf90a04 in check_hba (port=0x110abc279800) at hba.c:1718
  #5  0x000011080cf91d34 in hba_getauthmethod (port=0x110abc279800) at
hba.c:2256
  #6  0x000011080cf88eb3 in ClientAuthentication (port=0x110abc279800) at
auth.c:307
  #7  0x000011080d1edf5d in PerformAuthentication (port=0x110abc279800) at
postinit.c:223
  #8  0x000011080d1eeae7 in InitPostgres (in_dbname=0x110ad7782be0
"virtstart-dev", dboid=0,
      username=0x110ad7782bc0 "virtstart-dev", out_dbname=0x0) at
postinit.c:688
  #9  0x000011080d0a3eb1 in PostgresMain (argc=1, argv=0x110ad7782c00,
dbname=0x110ad7782be0 "virtstart-dev",
      username=0x110ad7782bc0 "virtstart-dev") at postgres.c:3749
  #10 0x000011080d033537 in BackendRun (port=Could not find the frame base for
"BackendRun".
  ) at postmaster.c:4155
  #11 0x000011080d032be8 in BackendStartup (port=0x110abc279800) at
postmaster.c:3829
  #12 0x000011080d02f2d0 in ServerLoop () at postmaster.c:1597
  #13 0x000011080d02e968 in PostmasterMain (argc=3, argv=0x7f7ffffd9658) at
postmaster.c:1244
  #14 0x000011080cf96dc8 in main (argc=Could not find the frame base for
"main".
  ) at main.c:228
  Current language:  auto; currently asm

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Stuart Henderson
In reply to this post by Stuart Henderson
On 2015/02/16 21:02, Stuart Henderson wrote:
> On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote:
> >   (gdb) bt
>
> Was this backtrace from a new coredump, or was it from one created by
> the old binary? (if the latter, please could you remove the old coredump
> and get it to crash again and send a fresh backtrace?)
>

OK, replicated it here now...

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Hugo Osvaldo Barrera-2
In reply to this post by Hugo Osvaldo Barrera-2
On 2015-02-16 20:44, Stuart Henderson wrote:
> > Thanks a lot, it did. I was unaware of make DEBUG, and had been editing
the

> > Makefile with no success.
>
> The missing piece is that, normally, binaries get stripped of their
> debug symbols in the "fake install" stage. Passing the flags in via DEBUG
> (in most cases) avoids this step.
>
> Could you let me have a copy of your pg_hba.conf please? Looking at the
> trace and code it's a bit odd and I'd like to try and replicate it here if
> I can ..
>

After submitting the backtrace upstream (eg: to the pgsql list), it would
seem
that it's an issue on the postgres codebase, triggered by the OpenBSD upgrade
(apparently), but nonetheless an issue in pg itself:

  http://www.postgresql.org/message-id/16513.1424120138@...

I'll post back (for posterity's sake) once I have a permanent fix.

Thanks a bunch for helping be track the issue down and getting a proper
backtrace.

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Amit Kulkarni-5
In reply to this post by Hugo Osvaldo Barrera-2
On Mon, Feb 16, 2015 at 2:19 PM, Hugo Osvaldo Barrera <[hidden email]>
wrote:

> On 2015-02-16 16:24, Stuart Henderson wrote:
> > On 2015-02-15, Hugo Osvaldo Barrera <[hidden email]> wrote:
> > >
> > > Am I mistaken in understanding that this is an issue with postgresql
> itself,
> > > and not a local configuration error?
> >
> > Correct.
> >
> > > I tried building postgres with debug symbols (I added the flags
> described
> > > here[1] to the ports Makefile), but the backtrace is still useless:
> >
> > Please would you rebuild from the original port like this:
> >
> > make clean=all
> > make DEBUG="-O0 -g" repackage && sudo make reinstall
> >
> > and see if this gives a better backtrace.
> >
>
> Thanks a lot, it did. I was unaware of make DEBUG, and had been editing the
> Makefile with no success.
>
>   (gdb) bt
>   #0  0x0000110a2815b92a in kill () at <stdin>:2
>   #1  0x0000110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
>   #2  0x0000110a2816a238 in memcpy (dst0=0xfb8d4, src0=0x6, length=0) at
> /usr/src/lib/libc/string/memcpy.c:65
>   #3  0x000011080cf8d1b1 in check_ip (raddr=0x110a899f7918,
> addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704
>   #4  0x000011080cf90a04 in check_hba (port=0x110a899f7800) at hba.c:1718
>   #5  0x000011080cf91d34 in hba_getauthmethod (port=0x110a899f7800) at
> hba.c:2256
>   #6  0x000011080cf88eb3 in ClientAuthentication (port=0x110a899f7800) at
> auth.c:307
>   #7  0x000011080d1edf5d in PerformAuthentication (port=0x110a899f7800) at
> postinit.c:223
>   #8  0x000011080d1eeae7 in InitPostgres (in_dbname=0x110af4508c00
> "virtstart-dev", dboid=0,
>       username=0x110af4508be0 "virtstart-dev", out_dbname=0x0) at
> postinit.c:688
>   #9  0x000011080d0a3eb1 in PostgresMain (argc=1, argv=0x110af4508c20,
> dbname=0x110af4508c00 "virtstart-dev",
>       username=0x110af4508be0 "virtstart-dev") at postgres.c:3749
>   #10 0x000011080d033537 in BackendRun (port=Could not find the frame base
> for
> "BackendRun".
>   ) at postmaster.c:4155
>   #11 0x000011080d032be8 in BackendStartup (port=0x110a899f7800) at
> postmaster.c:3829
>   #12 0x000011080d02f2d0 in ServerLoop () at postmaster.c:1597
>   #13 0x000011080d02e968 in PostmasterMain (argc=3, argv=0x7f7ffffd9658) at
> postmaster.c:1244
>   #14 0x000011080cf96dc8 in main (argc=Could not find the frame base for
> "main".
>   ) at main.c:228
>   Current language:  auto; currently asm
>
> This doesn't say much to me though. I guess my best shot is to post this at
> the
> postgresql list, right?
>
> Thanks,
>

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/hba.c;h=9cde6a21ce99003102dc9303288001d24e3ba2b6;hb=HEAD#l703

One of these are the offending lines...
Refer to http://www.tedunangst.com/flak/post/memcpy-vs-memmove

Guys, please correct me if I am wrong. There might be more such bugs in
postgres, not sure why others are not hitting those.

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Jeremie Courreges-Anglas-2
In reply to this post by Hugo Osvaldo Barrera-2
Please try the diff below.  It fixes the "backwards memcpy" problem
easily noticeable with psql -h ::1.

$OpenBSD$
--- src/backend/libpq/hba.c.orig Mon Feb 16 21:53:21 2015
+++ src/backend/libpq/hba.c Mon Feb 16 21:54:44 2015
@@ -700,8 +700,8 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru
  struct sockaddr_storage addrcopy,
  maskcopy;
 
- memcpy(&addrcopy, &addr, sizeof(addrcopy));
- memcpy(&maskcopy, &mask, sizeof(maskcopy));
+ memcpy(&addrcopy, addr, addr->sa_len);
+ memcpy(&maskcopy, mask, mask->sa_len);
  pg_promote_v4_to_v6_addr(&addrcopy);
  pg_promote_v4_to_v6_mask(&maskcopy);
 


--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE

Reply | Threaded
Open this post in threaded view
|

Re: postgresql-server exiting abnormally after upgrade to -snapshot

Jeremie Courreges-Anglas-2
[hidden email] (Jérémie Courrèges-Anglas) writes:

> Please try the diff below.  It fixes the "backwards memcpy" problem
> easily noticeable with psql -h ::1.

Updated diff. Thanks to Stuart for reminding me that netmasks sa_len
values can be much surprising.

$OpenBSD$
--- src/backend/libpq/hba.c.orig Mon Feb 16 21:53:21 2015
+++ src/backend/libpq/hba.c Mon Feb 16 23:08:38 2015
@@ -700,8 +700,13 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru
  struct sockaddr_storage addrcopy,
  maskcopy;
 
- memcpy(&addrcopy, &addr, sizeof(addrcopy));
- memcpy(&maskcopy, &mask, sizeof(maskcopy));
+ memcpy(&addrcopy, addr, sizeof(struct sockaddr_in));
+ /*
+ * On some OSes, if mask is obtained from eg. getifaddrs(3), sa_len
+ * can vary wildly. We already know that addr->sa_family == AF_INET,
+ * so just use sizeof(struct sockaddr_in).
+ */
+ memcpy(&maskcopy, mask, sizeof(struct sockaddr_in));
  pg_promote_v4_to_v6_addr(&addrcopy);
  pg_promote_v4_to_v6_mask(&maskcopy);
 


--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE

12