I’m dabbling a bit in retro programming because apparently I have nothing better to do, which is a field of joy in a world of pain. Common wisdom is that the IBM PC can’t do anything in parallel; there is only a single CPU, no GPU (in the modern sense) and mostly no co-processor. There is an Intel 8237 DMA chip which can autonomously move data between devices, but it can’t be used for RAM-to-RAM copying or copying to/from the graphics card [DoD]. Or does it?
The smart folks over at the Vintage Computer Federation found a, hacky, way [MMD] to do just that. Memory-to-memory DMA is possible on the IBM PC/XT, despite Channel 0 typically being reserved for DRAM refresh. The transfer must occur within a short time span and within the same segment because Channel 0 and 1 share a page register.
Sequence
- Disable RAM refresh (by stopping the PIT timer)
- Set the “extended write selection” value in the command register. Without this, operations like copying to CGA card memory fail due to timing issues (the write line is only active for one cycle, missing the wait state signal).
- Set memory-to-memory mode: Channel 0 for reading, Channel 1 for writing.
- Mask off both channels, trigger a software DMA request for Channel 0, and wait for Terminal Count on Channel 1.
- Enable RAM refresh
Unfortunately there isn’t any code published.
Effects
- System-to-system copy works fine and is stable.
- CGA-to-CGA copy requires “extended write selection” to handle wait states.
- EGA copy: it is possible to push memory-to-memory DMA onto an EGA card (tested with Video Seven VEGA and IBM EGA) by tweaking timings and using “extended write selection” to create effects like a sliding copy of an image on the screen.
Performance Characteristics
- CGA-to-CGA DMA copies are slightly faster than a CPU using REP MOVSW or REP MOVSB for video memory (DMA takes ~21.3 cycles per byte, while REP MOVSW in CGA is ~26.5 cycles per byte).
- However, REP MOVSW from system RAM to CGA is faster (19.2 cycles per byte) than doing a DMA copy.
- When copying with an EGA card, the CPU uses REP MOVSB because the EGA latches don’t work reliably with 16-bit accesses.
Limitations on Transfer Size
- Normal hardware specifications require a DRAM refresh every 4ms (about 256 rows to refresh out of an available window of ~19091 cycles). The maximum safe continuous DMA copy size is theoretically ~1922 bytes before a standard machine risks crashing from lack of refresh.
- However, many specific PC XT setups and memory chips are extremely tolerant. In practice, this allows the refresh to be disabled long enough to transfer over 16kb (e.g., the size of a 320×200 16-color screen plane) or more at a time without parity errors.
Resources
[DoD] The Danger of Datasheets
https://www.os2museum.com/wp/the-danger-of-datasheets/
[MMD] Memory to Memory DMA on the IBM PC PC/XT
https://forum.vcfed.org/index.php?threads/memory-to-memory-dma-on-the-ibm-pc-pc-xt.1256277/