Made of three processing rings, with 3 control wires, direct opposite
ring segment, and its two neighbours, this is your double data rate, or dead beef and the global clock. The local clock is the segment and its immediate neighbours. Stack three of them, and add a dimension in the topology, and as many datapaths as possible between the faster parts of the system, with digital sync between the local clock and speed of light in vacume. Which is an architecture where scatter-gather is extremely useful, as that works on the global clock. So a total 18 die's and a very difficult juggling act, where cable length's are legendary for the premium original Cray's. If you think you have a problem with your local segment, just feed beef. Not many explanations of this architecture that's around, but culture references like cult of the dead cow as a pun and wishes on those that occupied the whole system. Anyone that's been around a real one to know? If you want to know what's inside a cray, it's basically evil inside if you thought that would reveal something. -- Balder Oddson |
GPT-3 gone wild, or what? Definitely to late for Aprilfools-day.
|
In reply to this post by Balder Oddson
On Fri, Apr 02, 2021 at 02:39:42PM +0200, Balder Oddson wrote:
> Made of three processing rings, with 3 control wires, direct opposite > ring segment, and its two neighbours, this is your double data rate, or > dead beef and the global clock. The local clock is the segment and its > immediate neighbours. Stack three of them, and add a dimension in the > topology, and as many datapaths as possible between the faster parts of > the system, with digital sync between the local clock and speed of light > in vacume. Which is an architecture where scatter-gather is extremely > useful, as that works on the global clock. So a total 18 die's and a > very difficult juggling act, where cable length's are legendary for the > premium original Cray's. If you think you have a problem with your local > segment, just feed beef. > > Not many explanations of this architecture that's around, but culture > references like cult of the dead cow as a pun and wishes on those that > occupied the whole system. Anyone that's been around a real one to know? > If you want to know what's inside a cray, it's basically evil inside if > you thought that would reveal something. > Yes and no, as this likely works because: With direct wires and shortest distance and speed of light in the material as the clock. Simplest setup is one ring with 6 sockets, what's on each segment, which is a beef, or a processor as usual. Guarantees on digital sync that it knows. #1 being wrriten to, or writing to another. #2 that you are beef, and may or may not being doing a shared task. #3 idle or beef, exception level, local/global root. This being important, as the digital clock should be the same as the wired clock, where the die clock can skew just fine as long as being in the state of feedbeef or deadbeef is very tight. This being the general purpose brute force method you have, of scattering instructions in memory to your exact opposite node in the circle, with or without your neighbours. This allows wriggleroom where this may work, and where spending extra on cooling and perhaps carbon nano tubes for the wries to make this cache coherent beast fly. These pop-culture references like feedbeef, deadcow, deadbeef and feedface (terminal), likewise the temptation of calling it a scalar-vector machine data-core as its not an inefficient or rubbish architecture, just complicated about this 6 segment configuration. Due to the ability to skew, its practically going faster than the speed of light with the premiss that it is cache coherent with control wires to direct opposite node and its neighbours, not your own, with just one datapath across with wires for each segment. You SIMD and vector scatter and gather as if it werent for Cray aspirations in most things ever since. And it should be open for relying on some ideal properties and quirks. How that system would behave and make noise I don't know, but you could likely guess when it was writing the results, or gathering it in memory. Doubt this would be interesting to bitcoin, but you should be able to scrub any size link you can fit on a segment. Many old and cool antique architectures, Cray is the premiere architecture, he promised 10x performance and did so, not likely to get one on ebay to boot BSD on, not sure if you can get the OS or blueprints either. |
On Sat, 3 Apr 2021 at 10:09, Balder Oddson <[hidden email]> wrote:
[...] > Many old and cool antique architectures, Cray is the premiere > architecture, he promised 10x performance and did so, not likely to get > one on ebay to boot BSD on, not sure if you can get the OS or blueprints > either. > To drag this a tiny bit toward the approximate direction of being on-topic: if you do find one and want to run OpenSSH on it, you'll need to use 7.6p1 or earlier since I removed UNICOS support in 7.7p1 ( https://github.com/openssh/openssh-portable/commit/ddc0f3814881ea279a6b6d4d98e03afc60ae1ed7 ). -- Darren Tucker (dtucker at dtucker.net) GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new) Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement. |
In reply to this post by Benjamin Baier
> On 2 Apr 2021, at 14:17, Benjamin Baier <[hidden email]> wrote: > > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day. > If it’s GPT-3, it’s slipping. |
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote:
> > > On 2 Apr 2021, at 14:17, Benjamin Baier <[hidden email]> wrote: > > > > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day. > > > > If it’s GPT-3, it’s slipping. Yes and no, but if you draw the architecture up: 6 segments in a circle with flat sides and close. One control line for double data rate to opposite segment and its neigbhours. Such that the only data path goes straight forward. Let's imagine that each segment is the equivalent of 16*32 bit vector operations per core per cycle, and that the chip maths the speed of light across this octagon or whatever, such that you can pull and push on this link so hard you cause bremsstrahlung for trying to go to fast in parts of the segment or chip, killing parts of its over time and inoperable during the operation. Before saying that it's insane to run this at 10 Ghz, and that Von Neumann architecture is better or have a better tuned pipeline. I'll pump my neighbouring nodes at full speed. Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef, 0xbeef, 0xfeedface. So the two neigbhouring segments does deadbeef and use the beefy link to pump data to the other half of the cpu, I'll start doing remote ddr sram operations to drive as a von neumann chip. Which patent would you suggest for this if the important vectorization is done in software, in a UNIX model that should run on it, where some things are physical necessities, like a unix consol to a segment and a daemon that filter instructions, data and handles address space. -- Balder Oddson |
In reply to this post by Joe Davis
On Sat, Apr 03, 2021 at 04:06:42AM +0100, Joe Davis wrote:
> > > On 2 Apr 2021, at 14:17, Benjamin Baier <[hidden email]> wrote: > > > > GPT-3 gone wild, or what? Definitely to late for Aprilfools-day. > > > > If it’s GPT-3, it’s slipping. Yes and no, but if you draw the architecture up: 6 segments in a circle with flat sides and close. One control line for double data rate to opposite segment and its neigbhours. Such that the only data path goes straight forward. Let's imagine that each segment is the equivalent of 16*32 bit vector operations per core per cycle, and that the chip maths the speed of light across this octagon or whatever, such that you can pull and push on this link so hard you cause bremsstrahlung for trying to go to fast in parts of the segment or chip, killing parts of its over time and inoperable during the operation. Before saying that it's insane to run this at 10 Ghz, and that Von Neumann architecture is better or have a better tuned pipeline. I'll pump my neighbouring nodes at full speed. Each clock cycles give each segment the state of 0xfeedbeef, 0xdeadbeef, 0xbeef, 0xfeedface. So the two neigbhouring segments does deadbeef and use the beefy link to pump data to the other half of the cpu, I'll start doing remote ddr sram operations to drive as a von neumann chip. Which patent would you suggest for this if the important vectorization is done in software, in a UNIX model that should run on it, where some things are physical necessities, like a unix consol to a segment and a daemon that filter instructions, data and handles address space. You have your big lock that mainly creates the machine state every clock cycle. There are six fully functional segments that must initialise and run a local terminal. Very few have a relationship to Cray, I don't, not original nor modern Cray's. If you open up a Cray to try and work out how it works, you find empty space with a bunch of wires, get angry for the evil inside and go with a bunch of DEC's, as it doesn't involve physics shenanigans and actually has the important part inside. But it easier to tweak your digital spec based on length of wires. There were possible even a reason for picking Intel, as they focused on the part everyone liked about IBM compared to Cray's. |
Free forum by Nabble | Edit this page |