* The problematique is that on AMD64, DMA is limited to 32bit
addressing, I guess because unlike AMD64 arch CPU:s which all have
64bit DMA support, popular PCI accessories and supporting hardware
out there like bridges, have DMA functionality limited to 32bit
(Is this a feature of lower-quality hardware, or for very old PCI
devices, or is it systemic to the whole AMD64 ecosystem today?
Could a system be configured to use 64bit DMA on AMD64 and be
expected to work presuming recent or higher-quality / well-selected
* The OS asks the disk hardware to load disk data to give memory
locations via DMA, and then userland fread() and mmap() is fed with
that data - no need for further data moving or mapping. This is the
dynamics leading to the 4GB cap.
And, the 4GB cap is kind of constraining for any computer with much
RAM and lots of disk reading, as it means lots of reads that
wouldn't need to hit the disk (as it could be cached using all this
free memory) isn't cached and is directed to disk anyhow which takes
a lot of time, yes?
* This was recognized a long time ago and Bob wrote a solution in
the form of a "buffer cache flipper" that would push buffer cache
data out of the 32bit area (to "high memory" as in >32bit) hence
lifting the limit, via a "(generic) backpressure" mechanism that as
a bonus used the DMA engine to do the memory moving, I guess this
means that the buffer cache would be pretty much zero-cost to the
CPU - sounds incredibly neat!
And then, it didn't really work, malfunctioned and irritated people
(was "busted" - for unknown reasons, actually why was it?) and Theo
wrote it will be fixed in the future.
Has it been fixed since?
Also - when fixed, fread() and mmap() reads to data that's in the
buffer cache will be incredibly fast right, as, in optimal conditions
the mmap:ed addresses will be already-mapped to the buffer cache data
and hence in optimal conditions mmap:ed buffer cache data reads will
have the speed of any memory access, right?
Last, OpenBSD's biggest limit as an OS seems to be that the disk/file
subsystem is sequential. A modern SSD can read at 2.8GB/sec but that
requires parallellism, without multiqueueing and with small reads e.g.
4KB or smaller, speeds stay around 70-120MB/sec = ~3.5% of the
hardware's potential performance. This would be really worthy goal to
donate to for instance, in particular as OpenBSD leads the way in many
Are there any thoughts about implementing this in the future?
> Previously there was a years-long thread about a 4GB (32bit) buffer
> cache constraint on AMD64, ref
> https://marc.info/?t=146824436600004&r=1&w=2 .
> What I gather is,
> * The problematique is that on AMD64, DMA is limited to 32bit
> addressing, I guess because unlike AMD64 arch CPU:s which all have
> 64bit DMA support, popular PCI accessories and supporting hardware
> out there like bridges, have DMA functionality limited to 32bit
My read of that thread, particularly Theo's comments, is that no one
actually demonstrated a case where lack of 64bit DMA caused any problems or
If you have a system and use where lack of 64bit DMA creates a performance
limitation, then describe it and, *more importantly*, *why* you think the
DMA limit is involved.