A5E-DMA-Example¶
Overview¶
The DMA example works by demonstrating the following:
- The FPGA XorShift Data Generator generates a data pattern which is written into the HPS's DDR through the RX mSGDMA.
- The HPS verifies this data that is written to ensure that the expected data is received.
- This demonstrates the data path from the FPGA to the HPS
- The HPS generates an xorshift data pattern which it writes into the HPS DDR
- The FPGA TX mSGDMA reads this data and passes it to the XorShift Data Checker to verify the expected data is received.
- This demonstrates the data path from the HPS to the FPGA
┌──────────────────────────────┐ ┌────────────────────────┐
│ XorShift Data Generator │ │ RX mSGDMA │
│ (pattern source) ├────────►│ (FPGA → HPS DDR) │
└──────────────────────────────┘ └───────────┬────────────┘
│
│ AXI writes
▼
┌────────────────────────────────────────────────────────┐
│ HPS DDR │
│ (shared memory via FPGA-HPS interconnect bridge) │
└────────────────────────────────────────────────────────┘
│
│ AXI reads
▼
┌──────────────────────────────┐ ┌────────────────────────┐
│ XorShift Data Checker │◄────────┤ TX mSGDMA │
│ (pattern verification) │ │ (HPS DDR → FPGA) │
└──────────────────────────────┘ └────────────────────────┘
DMA Data Path - FPGA-to-HPS DDR Bridges¶
The mSGDMA engines reach the HPS DDR through one of the Agilex 5 HPS FPGA-to-HPS bridge interfaces. This example can be built to use either the FPGA-to-HPS (F2H) bridge or the FPGA-to-SDRAM (F2SDRAM) bridge. Each has different coherency behavior and therefore a different device-tree configuration. Refer to the Agilex 5 SoC FPGA Technical Reference Manual ("FPGA-to-HPS, HPS-to-FPGA, and FPGA-to-SDRAM Bridges" material) for the full description of each interface.
FPGA-to-HPS (F2H) Bridge - cache coherent (default configuration)¶
The F2H bridge is an ACE5-Lite interface that routes FPGA transactions through the HPS Cache Coherency Unit (CCU). Because traffic passes through the CCU, the FPGA can issue coherent accesses to HPS DDR: reads observe data that is still resident in the A55/A76 caches, and writes are reflected into the caches, with no software cache maintenance required.
To drive the correct ACE5-Lite coherency/snoop attributes, this design instantiates an ACE5-Lite Cache Coherency Translator (CCT) inside the DMA subsystem, in front of the F2H bridge. On the software side, the device-tree overlay marks each mSGDMA node dma-coherent and assigns it an SMMU stream ID via the iommus property:
dma_read: dma-controller@10000 {
compatible = "altr,socfpga-msgdma";
...
iommus = <&smmu 0x101>;
dma-coherent;
};
dma_write: dma-controller@10040 {
compatible = "altr,socfpga-msgdma";
...
iommus = <&smmu 0x100>;
dma-coherent;
};
With dma-coherent present, the kernel treats the DMA buffers as coherent and the cl_msgdma module skips the explicit cache flush/invalidate operations - the hardware coherency path keeps the CPU caches and DDR in sync.
FPGA-to-SDRAM (F2SDRAM) Bridge - non-coherent¶
The F2SDRAM bridge connects the FPGA more directly to the HPS hard memory controller, bypassing the CCU. It is non-coherent: the CPU caches are not snooped, so software must perform cache maintenance around every DMA transfer.
To use the F2SDRAM path, remove both the dma-coherent and iommus lines from each mSGDMA node in the overlay:
dma_read: dma-controller@10000 {
compatible = "altr,socfpga-msgdma";
...
/* no iommus property */
/* no dma-coherent property */
};
With those two properties removed, the kernel marks the buffers non-coherent and the cl_msgdma module performs the required CPU cache maintenance (flush before the FPGA reads, invalidate after the FPGA writes) so that the data stays consistent in software.
Note: selecting the F2SDRAM path is not purely a device-tree change - the FPGA design must also be built to route the mSGDMA masters to the F2SDRAM port. The overlay edit above is the software-visible half of that configuration.
When wiring the FPGA design to the F2SDRAM port, the f2sdram_adapter_256_hw.tcl bridge adapter must be placed between the F2SDRAM bridge port and the mSGDMA masters. Both the F2SDRAM port and the adapter are AXI4 interfaces; the adapter is an AXI4-to-AXI4 pass-through that forces the AXI sideband signals (notably the awuser/aruser user bits, along with awcache/arcache and awprot/arprot) to the fixed values the F2SDRAM port expects for non-coherent access. The interconnect auto-generated for the mSGDMA masters leaves those user bits at 0, which the F2SDRAM port mishandles - so without the adapter the mSGDMA descriptors complete but data is silently dropped on writes or returned scrambled on reads.
Prerequisites¶
Before building and running the example you will need:
- A Mity-A5E Development Board. The FPGA project and the per-board JIC differ between platforms, so build for the board you have.
- A Linux build host with Intel Quartus Prime Pro 25.3.1 installed, for compiling the FPGA design. See Building_fpga_2531pro.
- The MitySOM-A5E Yocto SDK toolchain (environment-setup-armv8-2a-mitysom-linux), used to cross-compile the kernel, device-tree overlay, kernel module, and userspace application. The toolchain ships alongside the prebuilt SD card images and is also in the Files section of Redmine.
- The Linux kernel source (linux-socfpga). See Linux_Kernel.
- A microSD card programmed with a MitySOM-A5E prebuilt image or recreated yourself.
Building the FPGA Design DMA Example¶
Compile the FPGA design¶
Refer to Building_fpga_2531pro for building the FPGA design. Navigate into the mitysom-a5e[-mini]-ref-dma example project before compiling the design.
- Ensure to flash the resulting a5e.hps.jic onto the hardware
- Ensure to replace the a5e.core.rbf and boot.scr on the SD card
Generate the FPGA design headers¶
Still in the FPGA project directory, run the following to generate the header files that describe the FPGA design
make generate_headers
Compile the HPS software support¶
In order to compile the HPS software support, please download and install the toolchain and source this in your environment. The toolchain can be found along with the Prebuild SD card images or in the Files section of Redmine. Once installed, make sure to source the toolchain.
source /path/to/installed/toolchain/environment-setup-armv8-2a-mitysom-linux
Compile the Linux Kernel, DTS, and Modules¶
The Linux Kernel source is also needed to compile dts overlays and kernel modules. Also, since we are compiling a kernel module out of tree we need to compile the Linux kernel and install it onto the SD card so that the versions will match, therefore, checkout the Linux Kernel source and compile it. This can be done following the instructions found here: Linux_Kernel Make sure to compile the kernel Image, dtbs and modules and install them onto the SD card following the instructions in the previous link
You can use the toolchain previously source to compile the kernel. Once the kernel is compiled, set the following environment variable to point to the source.
export KERNEL_SRC=/path/to/linux-socfpga
Alternative - match the kernel module to the installed kernel without recompiling¶
If you do not wish to recompile the kernel, you can build only the cl_msgdma module against a Linux source tree checked out to match the kernel already installed on the SD card. An out-of-tree module only loads if its vermagic string matches the running kernel, so the source must be at the same commit and configured the same way.
On the target, read the running kernel version. The release string ends in the source git hash:
root@mity-a5e:~# uname -a Linux mity-a5e 6.12.33-yocto-standard-g991d74eafd10 ...
Here 6.12.33-yocto-standard-g991d74eafd10 tells you the git hash is 991d74eafd10.
Check out the linux-socfpga source at that commit:
cd linux-socfpga git checkout 991d74eafd10
Reproduce the kernel release string and prepare the tree for external module builds. Create a localversion file holding the -yocto-standard suffix so the vermagic release string matches exactly, then run modules_prepare:
echo "-yocto-standard" > localversion make mity_a5e_devkit_defconfig make modules_prepare
Build the module against the prepared tree:
cd software/dma_example/kernel_module KERNEL_SRC=/path/to/linux-socfpga make
Compile the Kernel DTS Overlay¶
Navigate into the software/dma_example/kernel_dts directory of the FPGA project. Run the following to compile the dts overlay.
make
Copy the resulting dma_example.dtbo file into the /boot directory of the filesystem (ext partition) of the SD card
Compile the Kernel Module¶
Navigate into the software/dma_example/kernel_module directory of the FPGA project. Run the following to compile the kernel module.
make
Copy the resulting cl_msgdma.ko into the /root directory of the SD card (ext partition)
Compile the Userspace Application¶
Navigate into the software/dma_example/userspace directory of the FPGA project. Run the following to compile the userspace application.
make
Copy the resulting dma_example binary into the /root directory of the SD card (ext partition)
Deployment Summary¶
The example is made up of the build artifacts below, which must be deployed together. Rebuilding one without the others can leave a mismatched set on the target, so after any change redeploy the whole set. The table summarizes what each artifact is and where it goes on the target.
| Artifact | Built by | Destination on target |
|---|---|---|
| a5e.hps.jic | FPGA design build | Programmed to QSPI flash - see Building_sd_2531pro |
| a5e.core.rbf | FPGA design build | /lib/firmware/a5e.core.rbf - loaded into the FPGA at boot by boot.scr |
| boot.scr | FPGA design build (from scripts/boot.cmd) | SD card U-Boot script - loads the RBF and kernel and applies the overlay |
| dma_example.dtbo | software/dma_example/kernel_dts | /boot/dma_example.dtbo - applied at boot by boot.scr |
| cl_msgdma.ko | software/dma_example/kernel_module | /root/cl_msgdma.ko |
| dma_example | software/dma_example/userspace | /root/dma_example |
Running the DMA Example¶
Boot into Linux¶
Boot the modified SD card and login at the Linux prompt (User: root)
At boot, boot.scr loads a5e.core.rbf into the FPGA, then loads and applies dma_example.dtbo on top of the base device tree. This is why the DMA-example boot.scr must be the one deployed - it is what pulls in the overlay. You can confirm the overlay was applied by checking that the DMA controllers it adds are present:
root@mity-a5e:~# ls /sys/bus/platform/devices/ | grep dma-controller 20010000.dma-controller 20010040.dma-controller
Load the custom kernel module¶
insmod cl_msgdma.ko
Successful load should look like the following:
root@mity-a5e:~# insmod cl_msgdma.ko [ 180.513657] cl_msgdma: loading out-of-tree module taints kernel. [ 180.521379] cl-msgdma write-cl-msgdma: registered /dev/cl_msgdma_rx [ 180.527737] cl-msgdma write-cl-msgdma: CL mSGDMA driver probe success [ 180.535056] cl-msgdma read-cl-msgdma: registered /dev/cl_msgdma_tx [ 180.541309] cl-msgdma read-cl-msgdma: CL mSGDMA driver probe success
After a successful load, the two character devices used by the application should be present:
root@mity-a5e:~# ls /dev/cl_msgdma_* /dev/cl_msgdma_rx /dev/cl_msgdma_tx
Run the userspace application¶
./dma_example
By default the application runs both directions with verification enabled for 1000 iterations. It first prints a Run Configuration block echoing the active settings, then the per-phase progress lines, and finishes with a Test PASSED verdict. Successful execution should look like the following:
root@mity-a5e:~# ./dma_example Run Configuration: buf_size: 4194304 num_bufs: 8 num_iters: 1000 rx_enabled: 1 tx_enabled: 1 verify: 1 bandwidth: 0 rx_seed: 0xa5a5a5a5 tx_seed: 0x5a5a5a5a ========================== Starting DMA Test Allocating RX buffers Allocating TX buffers Queuing all RX buffers Setting up XorShift generator Setting up XorShift checker Priming TX buffers Starting the RX DMA Engine Starting the TX DMA Engine Running DMA loop Read 1000 buffers from DMA Data RX check passed Wrote 1000 buffers to DMA Data TX check passed Test PASSED
The final Test PASSED line is the all-good signal. Any Data RX check failed / Data TX check failed line, or a final Test FAILED, indicates a data mismatch.
Command-line options¶
The application accepts several flags to control buffer sizing, direction, verification, and throughput reporting. Run with --help to see the full usage:
root@mity-a5e:~# ./dma_example --help Usage: ./dma_example [options] -s, --buf-size BYTES buffer size, multiple of PAGE_SIZE [default: 4194304] -i, --iters N buffers to push through each direction [default: 1000] -n, --num-bufs N ring buffers per direction (> 0) [default: 8] -r, --rx-seed HEX xorshift seed for RX pattern [default: 0xa5a5a5a5] -t, --tx-seed HEX xorshift seed for TX pattern [default: 0x5a5a5a5a] -R, --rx-only run RX phase only -T, --tx-only run TX phase only -N, --no-verify skip CPU-side xorshift verify/generate (raw bandwidth) -B, --bandwidth print per-direction throughput on completion -q, --quiet suppress progress logging
A few notes on the most useful flags:
--bandwidthadds an Elapsed summary line with the measured per-direction throughput--no-verifyskips the CPU-side xorshift generate/check and reports raw descriptor-handshake throughput only. A passing--no-verifyrun does not prove data was actually transferred correctly - always confirm with a--verify(default) run on the same build.--rx-only/--tx-onlyexercise a single direction:--rx-onlyruns only the FPGA → HPS DDR path,--tx-onlyruns only the HPS DDR → FPGA path.--quietsuppresses the progress logging, leaving only the Elapsed summary (when--bandwidthis set) and the final PASS/FAIL verdict - handy for scripted test loops.
Note on TX throughput on SR0 silicon: the TX path (the FPGA reading from HPS DDR) is a coherent read through the F2H bridge, and on SR0 silicon it is limited by a known HPS read-throughput erratum (documented by Altera as "HPS EMIF read throughput less than target"). On SR0 parts the TX number reported by --bandwidth will be significantly lower than RX; the RX (FPGA-write) path is not affected. This is a silicon limitation of the cache-coherent F2H read path, not a defect in the design or the example.
The following script can be used to restart the example as the mSGDMAs can potentially FIFO data and hold onto buffers from the previous run which can cause incorrect data. Reloading the mSGDMA driver causes the mSGDMAs to reset and all of the buffers to be reallocated in a clean state. The script below can be ran to run the example again or simply reboot the unit and start again from Boot into Linux above.
#!/bin/bash rmmod cl_msgdma.ko echo 20010000.dma-controller > /sys/bus/platform/drivers/altera-msgdma/unbind echo 20010040.dma-controller > /sys/bus/platform/drivers/altera-msgdma/unbind echo 20010000.dma-controller > /sys/bus/platform/drivers/altera-msgdma/bind echo 20010040.dma-controller > /sys/bus/platform/drivers/altera-msgdma/bind insmod cl_msgdma.ko ./dma_example
Troubleshooting¶
/dev/cl_msgdma_rx and /dev/cl_msgdma_tx do not exist¶
The kernel module creates these character devices when it probes against the DMA controller nodes added by dma_example.dtbo. If they are missing:
- Confirm the overlay was applied -
ls /sys/bus/platform/devices/ | grep dma-controllershould list20010000.dma-controllerand20010040.dma-controller. If it does not, the FPGA design was likely built without the DMA example, or the DMA-example boot.scr and dma_example.dtbo were not deployed. Rebuild the FPGA design for the DMA example and redeploy boot.scr and the overlay. - Confirm the module loaded - check dmesg for the
CL mSGDMA driver probe successlines shown above.
insmod: ERROR: could not insert module: Invalid module format¶
The module's vermagic does not match the running kernel. The module must be built against a kernel source tree at the same commit and configuration as the kernel installed on the SD card. Either rebuild and install the kernel from your source tree, or follow the "match the kernel module to the installed kernel without recompiling" procedure above.
The application hangs or reports errors on the second and later runs¶
The mSGDMA engines can retain state and buffers from a previous run. Each dma_example invocation should be preceded by the driver reload/rebind cycle so the engines reset cleanly - use the restart script shown above between every run, not just between different test scenarios. If a run hangs, reload with that cycle and try again.
dmesg module names look reversed relative to the device nodes¶
This is expected. The dmesg probe messages name the engines from the FPGA point of view, while the character devices are named from the HPS point of view, so they appear swapped:
- write-cl-msgdma registers
/dev/cl_msgdma_rx(FPGA writes HPS DDR) - read-cl-msgdma registers
/dev/cl_msgdma_tx(FPGA reads HPS DDR)
Go to top