The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Balder Oddson
Made of three processing rings, with 3 control wires, direct opposite
ring segment, and its two neighbours, this is your double data rate, or
dead beef and the global clock. The local clock is the segment and its
immediate neighbours. Stack three of them, and add a dimension in the
topology, and as many datapaths as possible between the faster parts of
the system, with digital sync between the local clock and speed of light
in vacume. Which is an architecture where scatter-gather is extremely
useful, as that works on the global clock. So a total 18 die's and a
very difficult juggling act, where cable length's are legendary for the
premium original Cray's. If you think you have a problem with your local
segment, just feed beef.

Not many explanations of this architecture that's around, but culture
references like cult of the dead cow as a pun and wishes on those that
occupied the whole system. Anyone that's been around a real one to know?
If you want to know what's inside a cray, it's basically evil inside if
you thought that would reveal something.

--
Balder Oddson

Reply | Threaded
Open this post in threaded view
|

Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Benjamin Baier
GPT-3 gone wild, or what? Definitely to late for Aprilfools-day.

Reply | Threaded
Open this post in threaded view
|

Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Balder Oddson
In reply to this post by Balder Oddson
On Fri, Apr 02, 2021 at 02:39:42PM +0200, Balder Oddson wrote:

> Made of three processing rings, with 3 control wires, direct opposite
> ring segment, and its two neighbours, this is your double data rate, or
> dead beef and the global clock. The local clock is the segment and its
> immediate neighbours. Stack three of them, and add a dimension in the
> topology, and as many datapaths as possible between the faster parts of
> the system, with digital sync between the local clock and speed of light
> in vacume. Which is an architecture where scatter-gather is extremely
> useful, as that works on the global clock. So a total 18 die's and a
> very difficult juggling act, where cable length's are legendary for the
> premium original Cray's. If you think you have a problem with your local
> segment, just feed beef.
>
> Not many explanations of this architecture that's around, but culture
> references like cult of the dead cow as a pun and wishes on those that
> occupied the whole system. Anyone that's been around a real one to know?
> If you want to know what's inside a cray, it's basically evil inside if
> you thought that would reveal something.
>

Yes and no, as this likely works because:
With direct wires and shortest distance and speed of light in the
material as the clock. Simplest setup is one ring with 6 sockets, what's
on each segment, which is a beef, or a processor as usual. Guarantees on
digital sync that it knows.
#1 being wrriten to, or writing to another.
#2 that you are beef, and may or may not being doing a shared task.
#3 idle or beef, exception level, local/global root.

This being important, as the digital clock should be the same as the
wired clock, where the die clock can skew just fine as long as being in
the state of feedbeef or deadbeef is very tight. This being the general
purpose brute force method you have, of scattering instructions in
memory to your exact opposite node in the circle, with or without your
neighbours. This allows wriggleroom where this may work, and where
spending extra on cooling and perhaps carbon nano tubes for the wries to
make this cache coherent beast fly.

These pop-culture references like feedbeef, deadcow, deadbeef and
feedface (terminal), likewise the temptation of calling it a
scalar-vector machine data-core as its not an inefficient or rubbish
architecture, just complicated about this 6 segment configuration.

Due to the ability to skew, its practically going faster than the speed
of light with the premiss that it is cache coherent with control wires
to direct opposite node and its neighbours, not your own, with just one
datapath across with wires for each segment. You SIMD and vector scatter
and gather as if it werent for Cray aspirations in most things ever
since.

And it should be open for relying on some ideal properties and quirks.
How that system would behave and make noise I don't know, but you could
likely guess when it was writing the results, or gathering it in memory.

Doubt this would be interesting to bitcoin, but you should be able to
scrub any size link you can fit on a segment.

Many old and cool antique architectures, Cray is the premiere
architecture, he promised 10x performance and did so, not likely to get
one on ebay to boot BSD on, not sure if you can get the OS or blueprints
either.


Reply | Threaded
Open this post in threaded view
|

Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Darren Tucker-3
On Sat, 3 Apr 2021 at 10:09, Balder Oddson <[hidden email]> wrote:
[...]

> Many old and cool antique architectures, Cray is the premiere
> architecture, he promised 10x performance and did so, not likely to get
> one on ebay to boot BSD on, not sure if you can get the OS or blueprints
> either.
>

To drag this a tiny bit toward the approximate direction of being on-topic:
if you do find one and want to run OpenSSH on it, you'll need to use 7.6p1
or earlier since I removed UNICOS support in 7.7p1 (
https://github.com/openssh/openssh-portable/commit/ddc0f3814881ea279a6b6d4d98e03afc60ae1ed7
).

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
Reply | Threaded
Open this post in threaded view
|

Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Joe Davis
In reply to this post by Benjamin Baier

> On 2 Apr 2021, at 14:17, Benjamin Baier <[hidden email]> wrote:
>
> GPT-3 gone wild, or what? Definitely to late for Aprilfools-day.
>

If it’s GPT-3, it’s slipping.

Reply | Threaded
Open this post in threaded view
|

Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Balder Oddson
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote:
>
> > On 2 Apr 2021, at 14:17, Benjamin Baier <[hidden email]> wrote:
> >
> > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day.
> >
>
> If it’s GPT-3, it’s slipping.

Yes and no, but if you draw the architecture up:
6 segments in a circle with flat sides and close.
One control line for double data rate to opposite segment and its
neigbhours. Such that the only data path goes straight forward.
Let's imagine that each segment is the equivalent of 16*32 bit vector
operations per core per cycle, and that the chip maths the speed of
light across this octagon or whatever, such that you can pull and push
on this link so hard you cause bremsstrahlung for trying to go to fast
in parts of the segment or chip, killing parts of its over time and
inoperable during the operation.

Before saying that it's insane to run this at 10 Ghz, and that Von
Neumann architecture is better or have a better tuned pipeline.
I'll pump my neighbouring nodes at full speed.

Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef,
0xbeef, 0xfeedface.

So the two neigbhouring segments does deadbeef and use the beefy link to
pump data to the other half of the cpu, I'll start doing remote ddr sram
operations to drive as a von neumann chip.

Which patent would you suggest for this if the important vectorization
is done in software, in a UNIX model that should run on it, where some
things are physical necessities, like a unix consol to a segment and a
daemon that filter instructions, data and handles address space.


--
Balder Oddson

Reply | Threaded
Open this post in threaded view
|

Re: The simplest full cray data core with 3 cpu's and a physics hack that makes it work

Balder Oddson
In reply to this post by Joe Davis
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote:
>
> > On 2 Apr 2021, at 14:17, Benjamin Baier <[hidden email]> wrote:
> >
> > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day.
> >
>
> If it’s GPT-3, it’s slipping.

Yes and no, but if you draw the architecture up:
6 segments in a circle with flat sides and close.
One control line for double data rate to opposite segment and its
neigbhours. Such that the only data path goes straight forward.
Let's imagine that each segment is the equivalent of 16*32 bit vector
operations per core per cycle, and that the chip maths the speed of
light across this octagon or whatever, such that you can pull and push
on this link so hard you cause bremsstrahlung for trying to go to fast
in parts of the segment or chip, killing parts of its over time and
inoperable during the operation.

Before saying that it's insane to run this at 10 Ghz, and that Von
Neumann architecture is better or have a better tuned pipeline.
I'll pump my neighbouring nodes at full speed.

Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef,
0xbeef, 0xfeedface.

So the two neigbhouring segments does deadbeef and use the beefy link to
pump data to the other half of the cpu, I'll start doing remote ddr sram
operations to drive as a von neumann chip.

Which patent would you suggest for this if the important vectorization
is done in software, in a UNIX model that should run on it, where some
things are physical necessities, like a unix consol to a segment and a
daemon that filter instructions, data and handles address space.

You have your big lock that mainly creates the machine state every clock
cycle. There are six fully functional segments that must initialise and
run a local terminal.

Very few have a relationship to Cray, I don't, not original nor
modern Cray's. If you open up a Cray to try and work out how it works,
you find empty space with a bunch of wires, get angry for the evil
inside and go with a bunch of DEC's, as it doesn't involve physics
shenanigans and actually has the important part inside.
But it easier to tweak your digital spec based on length of wires.
There were possible even a reason for picking Intel, as they focused on
the part everyone liked about IBM compared to Cray's.