Memory Interfaces in Vitis HLS
This page explains how the PySilicon memory model maps to HLS/Vitis memory interfaces. Two interface styles are covered:
- AXI master (
m_axi) — byte-addressed, used for off-chip DDR or PYNQ buffers. - Local array / BRAM-like — word-indexed, used for on-chip storage or direct BRAM ports.
The histogram example is the main reference throughout.
AXI master interface (m_axi)
An AXI master port gives the kernel read/write access to a flat, byte-addressed memory space. In Vitis HLS, the interface pragma is:
#pragma HLS INTERFACE m_axi port=mem offset=slave bundle=gmem
The kernel receives a pointer to the base of the memory:
// From hist.cpp / hist.hpp
using mem_word_t = ap_uint<mem_dwidth>;
void hist(hls::stream<axis_word_t>& in_stream,
hls::stream<axis_word_t>& out_stream,
mem_word_t* mem);
Even though the hardware interface is byte-addressed (AXI4), the C++ pointer is word-addressed. Command metadata carries byte addresses that must be converted to word indices before indexing into mem.
Byte addresses in command metadata
The HistCmd descriptor stores three byte addresses — one for each memory region:
// From hist.cpp
const int ndata = static_cast<int>(cmd.ndata);
const int nbins = static_cast<int>(cmd.nbins);
// cmd.data_addr, cmd.bin_edges_addr, cmd.cnt_addr are byte addresses
These addresses are the values produced by Memory.alloc() on the Python side (with addr_unit=AddrUnit.byte).
Alignment check
Before using any address, the kernel checks that it is aligned to the word boundary:
// From hist.cpp
if (!memmgr::is_word_aligned<mem_dwidth>(cmd.data_addr) ||
!memmgr::is_word_aligned<mem_dwidth>(cmd.bin_edges_addr) ||
!memmgr::is_word_aligned<mem_dwidth>(cmd.cnt_addr)) {
resp.status = HistError::ADDRESS_ERROR;
resp.write_axi4_stream<stream_dwidth>(out_stream, true);
return;
}
memmgr::is_word_aligned<W>(addr) checks that addr % (W / 8) == 0.
Converting byte addresses to word indices
After the alignment check, each byte address is converted to a word index:
// From hist.cpp
const int data_word_idx = memmgr::byte_addr_to_word_index<mem_dwidth>(cmd.data_addr);
const int edge_word_idx = memmgr::byte_addr_to_word_index<mem_dwidth>(cmd.bin_edges_addr);
const int count_word_idx = memmgr::byte_addr_to_word_index<mem_dwidth>(cmd.cnt_addr);
byte_addr_to_word_index<W>(addr) computes addr / (W / 8). These helpers are in memmgr.hpp, which is included automatically when you use the PySilicon build utilities.
Serialized reads and writes
With the word indices in hand, the kernel uses the generated array utilities to read data from and write data to the AXI memory port:
// From hist.cpp
// Read input data and bin edges into local buffers
float32_array_utils::read_array<mem_dwidth>(mem + data_word_idx, data_buf, ndata);
if (nbins > 1) {
float32_array_utils::read_array<mem_dwidth>(mem + edge_word_idx, edge_buf, nbins - 1);
}
// ... compute histogram ...
// Write output counts back to memory
uint32_array_utils::write_array<mem_dwidth>(count_buf, mem + count_word_idx, nbins);
The read_array and write_array helpers handle packing/unpacking between the element type (e.g., float, ap_uint<32>) and the mem_word_t words of the AXI port. They are generated by gen_array_utils on the Python side and placed in the include/ directory.
BRAM / local-array style interfaces
For on-chip BRAM or local-array style memory, the kernel uses word indices directly rather than byte addresses. There is no m_axi pragma; the interface is either a direct array argument or an internal static array.
A typical pattern:
static float lut[TABLE_SIZE];
// ... fill lut at initialization or from a separate port ...
float result = lut[word_index];
On the Python side, model this with AddrUnit.word:
lut_mem = Memory(word_size=32, addr_size=16, addr_unit=AddrUnit.word)
lut_addr = lut_mem.alloc(TABLE_SIZE) # returns a word index, e.g. 0
lut_mem.write(lut_addr, packed_lut)
Because addresses are word indices in this mode, no byte-to-word conversion is needed. The address returned by alloc() can be used directly as a C array index in the testbench.
The key difference from the AXI master style:
| Property | AddrUnit.byte / AXI master |
AddrUnit.word / BRAM-like |
|---|---|---|
alloc() returns |
byte address | word index |
| HLS interface pragma | #pragma HLS INTERFACE m_axi |
none (local array or BRAM port) |
| Address conversion | required (byte_addr_to_word_index) |
not needed |
| Alignment requirement | must be word-aligned | always valid (integer index) |
C++ testbench mirroring the Python flow
The C++ testbench (hist_tb.cpp) mirrors the Python simulate() method exactly so that the same addresses are valid on both sides.
Python side (hist_demo.py)
# Byte-addressed memory matching the AXI interface
mem = Memory(word_size=MEM_DWIDTH, addr_size=MEM_AWIDTH, addr_unit=AddrUnit.byte)
# Allocate regions and write inputs
data_addr = mem.alloc(nwords_data)
edge_addr = mem.alloc(nwords_edges)
count_addr = mem.alloc(nwords_counts)
mem.write(data_addr, write_array(data, elem_type=Float32, word_bw=mem.word_size))
mem.write(edge_addr, write_array(bin_edges, elem_type=Float32, word_bw=mem.word_size))
# Addresses go into the command descriptor
cmd.data_addr = data_addr
cmd.bin_edges_addr = edge_addr
cmd.cnt_addr = count_addr
C++ testbench side (hist_tb.cpp)
// Flat memory buffer
static mem_word_t mem[MEM_SIZE] = {};
pysilicon::memmgr::MemMgr<mem_dwidth> mgr(mem, MEM_SIZE);
// Allocate regions — same first-fit algorithm as Python Memory.alloc
const int data_word_idx = mgr.alloc(nwords_data);
const int edge_word_idx = mgr.alloc(nwords_edges);
const int count_word_idx = mgr.alloc(nwords_count);
// Convert word indices to byte addresses for the HistCmd
const int bytes_per_word = mem_dwidth / 8;
const ap_uint<mem_awidth> data_byte_addr = data_word_idx * bytes_per_word;
const ap_uint<mem_awidth> edge_byte_addr = edge_word_idx * bytes_per_word;
const ap_uint<mem_awidth> count_byte_addr = count_word_idx * bytes_per_word;
// Write input data into the flat buffer
float32_array_utils::write_array<mem_dwidth>(data_buf, mem + data_word_idx, ndata);
if (nbins > 1) {
float32_array_utils::write_array<mem_dwidth>(edge_buf, mem + edge_word_idx, nbins - 1);
}
// Build and send the command
HistCmd cmd;
cmd.data_addr = data_byte_addr;
cmd.bin_edges_addr = edge_byte_addr;
cmd.cnt_addr = count_byte_addr;
MemMgr (from memmgr_tb.hpp) uses the same first-fit algorithm as Memory.alloc(). Since allocation order is identical on both sides, the byte addresses computed in the C++ testbench match those produced by the Python model.
Design guidance
Use AddrUnit.byte / m_axi when:
- The hardware will access off-chip DDR (PYNQ, Alveo, etc.).
- Addresses are communicated as AXI4 byte offsets in command metadata.
- The memory space is large and sparsely accessed.
Use AddrUnit.word / local array when:
- The data fits on-chip and will be stored in BRAM or registers.
- The kernel accesses elements by integer index, not byte address.
- No address-to-index conversion is needed in the synthesized code.