PostgreSQL for VAX on NetBSD/OpenBSD

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

Johnny Billquist
On 2014-06-29 12:12, Dave McGuire wrote:

> On 06/29/2014 03:10 PM, Patrick Finnegan wrote:
>> And it also runs on the 11/780 which can have multiple CPUs... but I've
>> never seen support for using more than one CPU (and the NetBSD page
>> still says "NetBSD/vax can only make use of one CPU on multi-CPU
>> machines").  If that has changed, I'd love to hear about it.  Support
>> for my VAX 6000 would also be nice...
>
>    It changed well over a decade ago, if memory serves.  The specific
> work was done on a VAX-8300 or -8350.  I'm pretty sure the 11/780's
> specific flavor of SMP is not supported.  (though I do have a pair of
> 11/785s here...wanna come hack? ;))

Well, VAX-11/78x do not support SMP, they have (had) ASMP only.

        Johnny

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

Robert Haas
In reply to this post by Robert Haas
On Wed, Jul 16, 2014 at 11:45 PM, Thor Lancelot Simon <[hidden email]> wrote:
> Well, I have to ask this question: why should there be any "vax-specific
> code"?  What facilities beyond what POSIX with the threading extensions
> offers on a modern system do you really need?  Why?

We have a spinlock implementation.  When spinlocks are not available,
we have to fall back to using semaphores, which is much slower.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

Greg Stark
In reply to this post by Robert Haas
On Thu, Jul 17, 2014 at 4:45 AM, Thor Lancelot Simon <[hidden email]> wrote:
> Except, of course, for IEEE floating point, because the VAX's floating point
> unit simply does not provide that

Actually I think that's relevant. We usually get focused on the
concurrency because that's an area where architectures vary a lot but
it sounds like VAX barely supports multiple CPUs and generally older
architectures had fairly mundane concurrency semantics since they were
designed to work with existing toolchains. From memory it wasn't until
later Sparc chips and Alpha that people started to experiment with
looser concurrency models and expecting the toolchains to satisfy
complex constraints to make them work.

But imho the interesting thing about supporting some older
architectures is for things like smoking out assumptions that math is
IEEE floating point or whatever caused initdb to generate an initial
config that couldn't start due to requiring too much memory.

There could also be interesting(ish) performance losses if we're using
lots of floating point math on a machine where floating point is
emulated or perhaps using lots of 64-bit integers on a machine where
it's implemented by the compiler using 32-bit operations. I don't
think we're too concerned about performance on older architectures but
if it's easy enough to avoid we might want to. Or at least we might
want to know what architectures can't reasonably run a database due to
these kinds of issues.





--
greg

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

Johnny Billquist
On 2014-07-17 16:53, Greg Stark wrote:

> On Thu, Jul 17, 2014 at 4:45 AM, Thor Lancelot Simon <[hidden email]> wrote:
>> Except, of course, for IEEE floating point, because the VAX's floating point
>> unit simply does not provide that
>
> Actually I think that's relevant. We usually get focused on the
> concurrency because that's an area where architectures vary a lot but
> it sounds like VAX barely supports multiple CPUs and generally older
> architectures had fairly mundane concurrency semantics since they were
> designed to work with existing toolchains. From memory it wasn't until
> later Sparc chips and Alpha that people started to experiment with
> looser concurrency models and expecting the toolchains to satisfy
> complex constraints to make them work.

Well, VAXen support multiple CPUs just fine. However, NetBSD/vax barely
have support for it. That could of course change with time, as there are
plenty of multiple CPU machines around. We just need to add support for
them in NetBSD...

Also, VAX did not use CAS as the general paradigm for atomic writes and
so on, but have other explicit instructions that are guaranteed to be
atomic. NetBSD/vax don't use the VAX specific instructions, but emulates
CAS in the kernel instead. But I don't remember how that extends to
userland. It's obviously easiest if userland programs use the pthread
library functions, which are guaranteed to work right even in
multiprocessor environment.

Implementing your own spinlocks is of course possible, but a horrible
way to use machine resources in userland.

        Johnny

--
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: [hidden email]             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

Greg Stark
On Thu, Jul 17, 2014 at 4:04 PM, Johnny Billquist <[hidden email]> wrote:
> Also, VAX did not use CAS as the general paradigm for atomic writes and so
> on, but have other explicit instructions that are guaranteed to be atomic.
> NetBSD/vax don't use the VAX specific instructions, but emulates CAS in the
> kernel instead. But I don't remember how that extends to userland. It's
> obviously easiest if userland programs use the pthread library functions,
> which are guaranteed to work right even in multiprocessor environment.

pthread functions may work by accident in shared memory but there's no
way to be sure they won't depend on some pthread threading data
structures. In short, if you don't use pthreads you can't really count
on pthread functions to work.

We did experiment a while back with using futexes on Linux instead of
our spinlocks but the experiments didn't seem to work out.

--
greg

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

Greg Stark
In reply to this post by John Klos
On Wed, Jun 25, 2014 at 6:05 PM, John Klos <[hidden email]> wrote:
> While I wouldn't be surprised if you remove the VAX code because not many
> people are going to be running PostgreSQL, I'd disagree with the assessment
> that this port is broken. It compiles, it initializes databases, it runs, et
> cetera, albeit not with the default postgresql.conf.


So I've been playing with this a bit. I have simh running on my home
server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
intended to be a SAN and its cpu is woefully underpowered so the
resulting VAX is actually very very slow. So slow I wonder if there's
a bug in the emulator but anyways.

I'm coming to the conclusion that the port doesn't really work
practically speaking but the failures are more interesting than I
expected. They come in a few varieties:

1) Vax does not have IEEE fp. This manifests in a few ways, some of
which may be our own bugs or missing expected outputs. The numeric
data type maths often produce numbers rounded off differently, the
floating point tests print numbers one digit shorter than our expected
results expect and sometimes in scientific notation where we don't
expect. The overflow tests generate floating point exceptions rather
than overflows. Infinity and NaN don't work. The Json code in
particular generates large numbers where +/- Infinity literals are
supplied.

There are some planner tests that fail with floating point exceptions
-- that's probably a bug on our part. And I've seen at least one
server crash (maybe two) apparently caused by one as well which I
don't believe is expected.

2) The initdb problem is actually not our fault. It looks like a
NetBSD kernel bug when allocating large shared memory blocks on a
machine without lots of memory. There's not much initdb can do with a
kernel panic...  BSD still has the problem of kern.maxfiles defaulting
to a value low enough that even two connections causes the regression
tests to run out of file descriptors. That's documented and it would
be a right pain for initdb to detect that case.

3) The tests take so long to run that autovacuum kicks in and the
tests start producing rows in inconsistent orderings. I assume that's
a problem we've run into on the CLOBBER_CACHE animals as well?

4) One of the tablesample tests seems to freeze indefinitely. I
haven't looked into why yet. That might indeed indicate that the
spinlock code isn't working?

So my conclusion tentatively is that while the port doesn't actually
work practically speaking it is nevertheless uncovering some
interesting bugs.

--
greg

Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

Tom Lane
Greg Stark <[hidden email]> writes:
> So I've been playing with this a bit. I have simh running on my home
> server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
> intended to be a SAN and its cpu is woefully underpowered so the
> resulting VAX is actually very very slow. So slow I wonder if there's
> a bug in the emulator but anyways.

Fun fun!

> There are some planner tests that fail with floating point exceptions
> -- that's probably a bug on our part. And I've seen at least one
> server crash (maybe two) apparently caused by one as well which I
> don't believe is expected.

That seems worth poking into.

> 3) The tests take so long to run that autovacuum kicks in and the
> tests start producing rows in inconsistent orderings. I assume that's
> a problem we've run into on the CLOBBER_CACHE animals as well?

I'd tentatively bet that it's more like planner behavioral differences
due to different floating-point roundoff.

> 4) One of the tablesample tests seems to freeze indefinitely. I
> haven't looked into why yet. That might indeed indicate that the
> spinlock code isn't working?

The tablesample tests seem like a not-very-likely first place for such a
thing to manifest.  What I'm thinking is that there are places in there
where we loop till we get an expected result.  Offhand I thought they were
all integer math; but if one was float and the VAX code wasn't doing what
was expected, maybe we could blame this on float discrepancies as well.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

David Brownlee
In reply to this post by Greg Stark
On 20 August 2015 at 14:54, Greg Stark <[hidden email]> wrote:

> On Wed, Jun 25, 2014 at 6:05 PM, John Klos <[hidden email]> wrote:
>> While I wouldn't be surprised if you remove the VAX code because not many
>> people are going to be running PostgreSQL, I'd disagree with the assessment
>> that this port is broken. It compiles, it initializes databases, it runs, et
>> cetera, albeit not with the default postgresql.conf.
>
> So I've been playing with this a bit. I have simh running on my home
> server as a Vax  3900 with NetBSD 6.1.5. My home server was mainly
> intended to be a SAN and its cpu is woefully underpowered so the
> resulting VAX is actually very very slow. So slow I wonder if there's
> a bug in the emulator but anyways.

I've run NetBS/vax in simh on a laptop with a 2.5Ghz i5-2520M and it
feels "reasonably fast or a VAX" (make of that what you will :)

> I'm coming to the conclusion that the port doesn't really work
> practically speaking but the failures are more interesting than I
> expected. They come in a few varieties:

Mmm. edge cases and failing (probably reasonable :) assumptions.

> 1) Vax does not have IEEE fp. This manifests in a few ways, some of
> which may be our own bugs or missing expected outputs. The numeric
> data type maths often produce numbers rounded off differently, the
> floating point tests print numbers one digit shorter than our expected
> results expect and sometimes in scientific notation where we don't
> expect. The overflow tests generate floating point exceptions rather
> than overflows. Infinity and NaN don't work. The Json code in
> particular generates large numbers where +/- Infinity literals are
> supplied.
>
> There are some planner tests that fail with floating point exceptions
> -- that's probably a bug on our part. And I've seen at least one
> server crash (maybe two) apparently caused by one as well which I
> don't believe is expected.

Sounds like some useful test cases there.

> 2) The initdb problem is actually not our fault. It looks like a
> NetBSD kernel bug when allocating large shared memory blocks on a
> machine without lots of memory. There's not much initdb can do with a
> kernel panic...

That should definitely be fixed...

>                        BSD still has the problem of kern.maxfiles defaulting
> to a value low enough that even two connections causes the regression
> tests to run out of file descriptors. That's documented and it would
> be a right pain for initdb to detect that case.

Is initdb calling ulimit() to check/set open files? Its probably worth
it as a sanity check if nothing else.

I think the VAX default open_max is 128. The 'bigger' ports have a
default of 1024, and I think they should probably all be updated to
that, though that is orthogonal to a ulimit() check.

> 3) The tests take so long to run that autovacuum kicks in and the
> tests start producing rows in inconsistent orderings. I assume that's
> a problem we've run into on the CLOBBER_CACHE animals as well?

Can the autovaccum daemon settings be bumped/disabled while running the tests?

> 4) One of the tablesample tests seems to freeze indefinitely. I
> haven't looked into why yet. That might indeed indicate that the
> spinlock code isn't working?
>
> So my conclusion tentatively is that while the port doesn't actually
> work practically speaking it is nevertheless uncovering some
> interesting bugs.

Good to hear. Looks like bugs in both the OS and software side, so fun for all.

Thanks for taking the time to do this!

David

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

Greg Stark
On Thu, Aug 20, 2015 at 4:13 PM, David Brownlee <[hidden email]> wrote:
>> 2) The initdb problem is actually not our fault. It looks like a
>> NetBSD kernel bug when allocating large shared memory blocks on a
>> machine without lots of memory. There's not much initdb can do with a
>> kernel panic...
>
> That should definitely be fixed...

cf
http://mail-index.netbsd.org/port-vax/2015/08/19/msg002524.html
http://comments.gmane.org/gmane.os.netbsd.ports.vax/5773

It's possible it's a simh bug it smells more like a simple overflow to me.

>>                        BSD still has the problem of kern.maxfiles defaulting
>> to a value low enough that even two connections causes the regression
>> tests to run out of file descriptors. That's documented and it would
>> be a right pain for initdb to detect that case.
>
> Is initdb calling ulimit() to check/set open files? Its probably worth
> it as a sanity check if nothing else.

Yup, we do that.

> I think the VAX default open_max is 128. The 'bigger' ports have a
> default of 1024, and I think they should probably all be updated to
> that, though that is orthogonal to a ulimit() check.

That's the problem. initdb tests how many connections can start up
when writing the default config. But we assume that each process can
use up to the rlimit file descriptors without running into a
system-wide limit. Raising it to 1024 lets me get two processes
running which is how I'm running them currently.

Also I forgot to mention, I also have to raise the stack limit with
ulimit. The default is so small that Postgres calculates the maximum
safe value for its max_stack_depth config is 0.


--
greg

Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

Greg Stark
In reply to this post by Tom Lane
On Thu, Aug 20, 2015 at 3:29 PM, Tom Lane <[hidden email]> wrote:

>
>> 4) One of the tablesample tests seems to freeze indefinitely. I
>> haven't looked into why yet. That might indeed indicate that the
>> spinlock code isn't working?
>
> The tablesample tests seem like a not-very-likely first place for such a
> thing to manifest.  What I'm thinking is that there are places in there
> where we loop till we get an expected result.  Offhand I thought they were
> all integer math; but if one was float and the VAX code wasn't doing what
> was expected, maybe we could blame this on float discrepancies as well.

Ah, I was wrong. It's not the tablesample test -- I think that was the
last one to complete. Annoyingly we don't seem to print test names
until they finish.

It was groupingsets. And it's stuck again on the same query:

regression=# select
pid,now()-query_start,now()-state_change,waiting,state,query from
pg_stat_activity where pid <> pg_backend_pid();
+------+-----------------+-----------------+---------+--------+------------------------------------------------------+
| pid  |    ?column?     |    ?column?     | waiting | state  |
                query                         |
+------+-----------------+-----------------+---------+--------+------------------------------------------------------+
| 9185 | 00:53:38.571552 | 00:53:38.571552 | f       | active | select
a, b, grouping(a,b), sum(v), count(*), max(v)#|
|      |                 |                 |         |        |   from
gstest1 group by rollup (a,b);                |
+------+-----------------+-----------------+---------+--------+------------------------------------------------------+

It's only been stuck an hour so it's possible it's still running but
this morning it was the same query that was running for 7 hours so I'm
guessing not.

Unfortunately I appear to have built without debugging symbols so
it'll be a couple days before I can rebuild with symbols to get a back
trace. (I vaguely remember when builds took hours but I don't recall
ever having to wait 48 hours for a build even back then)

--
greg

Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] PostgreSQL for VAX on NetBSD/OpenBSD

Andres Freund-2
On 2015-08-20 16:42:21 +0100, Greg Stark wrote:

> Ah, I was wrong. It's not the tablesample test -- I think that was the
> last one to complete. Annoyingly we don't seem to print test names
> until they finish.
>
> It was groupingsets. And it's stuck again on the same query:
>
> regression=# select
> pid,now()-query_start,now()-state_change,waiting,state,query from
> pg_stat_activity where pid <> pg_backend_pid();
> +------+-----------------+-----------------+---------+--------+------------------------------------------------------+
> | pid  |    ?column?     |    ?column?     | waiting | state  |
>                 query                         |
> +------+-----------------+-----------------+---------+--------+------------------------------------------------------+
> | 9185 | 00:53:38.571552 | 00:53:38.571552 | f       | active | select
> a, b, grouping(a,b), sum(v), count(*), max(v)#|
> |      |                 |                 |         |        |   from
> gstest1 group by rollup (a,b);                |
> +------+-----------------+-----------------+---------+--------+------------------------------------------------------+
>
> It's only been stuck an hour so it's possible it's still running but
> this morning it was the same query that was running for 7 hours so I'm
> guessing not.

Interesting.

> Unfortunately I appear to have built without debugging symbols so
> it'll be a couple days before I can rebuild with symbols to get a back
> trace. (I vaguely remember when builds took hours but I don't recall
> ever having to wait 48 hours for a build even back then)

Without any further clues I'd guess it's stuck somewhere in
bipartite_match.c. That's the only place where floating point problmes
would possibly result in getting stuck.


I'm all for making sure these issues are indeed caused by platform
specific float oddities, and not a more fundamental problem. But to me
the state of this port, as evidenced in this thread, seems to be too bad
to be worthwhile keeping alive. Especially since there's really no
imaginable use case except for playing around.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL for VAX on NetBSD/OpenBSD

Greg Stark
In reply to this post by John Klos
On Wed, Jun 25, 2014 at 10:00 AM, Anders Magnusson <[hidden email]>
wrote:

> John Klos skrev 2014-06-25 04:16:
>
> Then the machine paniced. The serial console showed:
>
> panic: usrptmap space leakage
> cpu0: Begin traceback...
> panic: usrptmap space leakage
> Stack traceback :
>          Process is executing in user space.
> cpu0: End traceback...
>
> Hm, can you add info about this panic to PR #28379 ?  I will try to hunt
> this down soon, so I need some test cases.
>

Is this still of interest btw? What kind of info do you need?

--
greg
123