Array Transfer Interface

ArrayTransferIF is a logical interface that carries a variable-length array of typed elements between simulation components. The element type can be any DataSchema subclass — a scalar field (Float32, U8), a composite DataList, or a DataArray.

ArrayTransferIF generalises SchemaTransferIF: a single-element array transfer with count=1 is equivalent to a schema transfer, and the implementations share the same PhysicalTransport layer.

Application layer:   Component.write(elements)    rx_proc(elements) / get(count)
                           │                                  │
Logical layer:   ArrayTransferIFMaster              ArrayTransferIFSlave
                           │                                  │
Transport layer:     PhysicalTransport       (StreamTransport | …)
                           │                                  │
Physical layer:    StreamIFMaster                    StreamIFSlave

Classes

Class	Role
`ArrayTransferIFMaster`	Serializes an element list → one word burst → transport
`ArrayTransferIFSlave`	Receives a word burst → deserializes → delivers via `rx_proc` or `get(count)`
`ArrayTransferIF`	Optional logical container; validates endpoint types, `element_type`, and `bitwidth`

PhysicalTransport and StreamTransport are shared with SchemaTransferIF; see the Schema Transfer Interface page.

ArrayTransferIFMaster

Parameter	Type	Default	Meaning
`transport`	`PhysicalTransport`	—	Physical layer to transmit through
`element_type`	`type[DataSchema]`	—	Schema class for each element
`bitwidth`	`int`	`32`	Word width for serialization

master = ArrayTransferIFMaster(
    sim=sim,
    transport=transport,
    element_type=Float32,
    bitwidth=32,
)

Usage — from inside a run_proc:

def run_proc(self) -> ProcessGen[None]:
    samples = np.array([1.0, -2.5, 3.14], dtype=np.float32)
    yield from self.arr_master.write(samples)

write(elements) serializes each element with element_type.nwords_per_inst(bitwidth) words, concatenates them into one burst, and forwards the burst to the transport as a single AXI-Stream packet (TLAST asserted on the last word).

Numpy fast path — when element_type is a scalar FloatField or IntField and elements is a np.ndarray, the array is converted to words in a single vectorized operation (no per-element schema instance allocation):

yield from self.arr_master.write(np.array([1.0, -2.5, 3.14], dtype=np.float32))  # Float32
yield from self.arr_master.write(np.array([10, 20, 30], dtype=np.uint8))          # U8

elements may also contain schema instances or raw Python values — anything that element_type(value) accepts (slow path):

yield from self.arr_master.write([Float32(1.0), Float32(-2.5)])  # schema instances
yield from self.arr_master.write([1.0, -2.5, 3.14])              # raw values → Float32

ArrayTransferIFSlave

Parameter	Type	Default	Meaning
`transport`	`PhysicalTransport`	—	Physical layer to receive from
`element_type`	`type[DataSchema]`	—	Schema class for each element
`bitwidth`	`int`	`32`	Word width for deserialization
`rx_proc`	`Callable[[list], ProcessGen[None]] \\| None`	`None`	Push-mode callback
`pull_mode`	`bool`	`False`	When `True`, enables `get(count)` and disables the push callback

Two receive modes are available.

Push mode (default)

Set rx_proc to a callback and call pre_sim(). When a burst arrives, the element count is inferred from the burst length divided by element_type.nwords_per_inst(bitwidth), and rx_proc(elements) is called with the result.

For scalar FloatField / IntField element types, elements is a np.ndarray of the element’s native dtype (e.g. np.float32). For composite element types it is a list of deserialized instances.

def on_samples(self, elements: np.ndarray) -> ProcessGen[None]:
    # elements is np.ndarray[float32] — no per-element boxing
    process(elements)
    yield self.env.timeout(0)

slave = ArrayTransferIFSlave(
    sim=sim,
    transport=transport,
    element_type=Float32,
    bitwidth=32,
    rx_proc=self.on_samples,
)
slave.pre_sim()   # installs the receive callback

Pull mode

Set pull_mode=True (and do not set rx_proc). The owning component drives sequencing by calling get(count=n) directly. An exact element count is required; the burst length is validated against count * element_type.nwords_per_inst(bitwidth).

slave = ArrayTransferIFSlave(
    sim=sim,
    transport=transport,
    element_type=Float32,
    bitwidth=32,
    pull_mode=True,
)

# Inside the component's run_proc:
samples = yield from self.samp_slave.get(count=nsamp)
# samples is np.ndarray[float32] — use directly in numpy operations
values = samples.astype(float)

TLAST validation — get(count) raises RuntimeError if the burst does not match the expected length:

Condition	Message
`actual_words < count * nwords_per_elem`	`TLAST early: expected N words …`
`actual_words > count * nwords_per_elem`	`Missing TLAST: expected N words …`

This matches the error contract in the generated C++ utilities (TLAST_EARLY_SAMP_IN, NO_TLAST_SAMP_IN).

Lifecycle

In push mode, pre_sim() installs _on_words_received as the transport’s receive callback. When using Simulation.run_sim() this is called automatically. When driving env.run() directly, call slave.pre_sim() manually.

In pull mode, pre_sim() is a no-op and the physical slave’s run_proc exits immediately, leaving the data buffer available for get().

ArrayTransferIF

ArrayTransferIF is an optional Interface container that enforces consistency when binding endpoints:

from pysilicon.hw.schema_transfer_interface import ArrayTransferIF

iface = ArrayTransferIF(sim=sim)
iface.bind("master", master_ep)
iface.bind("slave",  slave_ep)

Binding raises:

TypeError if the wrong endpoint class is used for a side
ValueError if master and slave have different element_type or bitwidth

Example: push mode

The master sends a burst; the slave infers element count from burst length.

import numpy as np
from pysilicon.hw.clock import Clock
from pysilicon.hw.dataschema import FloatField
from pysilicon.hw.interface import StreamIF, StreamIFMaster, StreamIFSlave
from pysilicon.hw.schema_transfer_interface import (
    ArrayTransferIFMaster, ArrayTransferIFSlave, StreamTransport,
)
from pysilicon.simulation.simulation import Simulation
from pysilicon.simulation.simobj import ProcessGen, SimObj

Float32 = FloatField.specialize(bitwidth=32)

sim = Simulation()
clk = Clock(freq=1e9)

stream_if     = StreamIF(sim=sim, clk=clk)
stream_master = StreamIFMaster(sim=sim, bitwidth=32)
stream_slave  = StreamIFSlave(sim=sim, bitwidth=32)
stream_if.bind("master", stream_master)
stream_if.bind("slave",  stream_slave)

transport = StreamTransport(master_ep=stream_master, slave_ep=stream_slave)

received: list[np.ndarray] = []

def on_samples(elements: np.ndarray) -> ProcessGen[None]:
    received.append(elements)   # np.ndarray[float32]
    yield sim.env.timeout(0)

arr_master = ArrayTransferIFMaster(sim=sim, transport=transport, element_type=Float32, bitwidth=32)
arr_slave  = ArrayTransferIFSlave(
    sim=sim, transport=transport, element_type=Float32,
    bitwidth=32, rx_proc=on_samples,
)
arr_slave.pre_sim()

def tx():
    yield from arr_master.write(np.array([1.0, 2.0, 3.0], dtype=np.float32))

sim.env.process(stream_slave.run_proc())
sim.env.process(tx())
sim.env.run(until=sim.env.timeout(10))
# received[0] == np.array([1.0, 2.0, 3.0], dtype=np.float32)

Example: pull mode (PolyAccel pattern)

The motivating use case: a component receives a typed header first, then uses the header’s nsamp field to pull the right number of samples from the same physical stream.

class PolyAccelSim(SimObj):
    def __post_init__(self) -> None:
        super().__post_init__()
        self.stream_slave = StreamIFSlave(sim=self.sim, bitwidth=32)

        transport = StreamTransport(master_ep=..., slave_ep=self.stream_slave)

        self.cmd_slave = SchemaTransferIFSlave(
            sim=self.sim, transport=transport,
            schema_type=PolyCmdHdr, bitwidth=32,
            pull_mode=True,
        )
        self.samp_slave = ArrayTransferIFSlave(
            sim=self.sim, transport=transport,
            element_type=Float32, bitwidth=32,
            pull_mode=True,
        )

    def run_proc(self) -> ProcessGen[None]:
        while True:
            cmd     = yield from self.cmd_slave.get()                        # PolyCmdHdr
            samples = yield from self.samp_slave.get(count=cmd.nsamp)        # np.ndarray[float32]
            _, out, _ = self.accel.evaluate(cmd, samples.astype(float))
            yield from self.resp_master.write(...)

Both cmd_slave and samp_slave share the same transport (and therefore the same physical StreamIFSlave). Because run_proc is a sequential coroutine, there is no contention — cmd_slave.get() consumes the first burst, then samp_slave.get(count=nsamp) consumes the second.

Synthesis mapping

When generating Vitis HLS code, ArrayTransferIF maps to the utilities produced by gen_array_utils:

Python	C++
`ArrayTransferIFMaster(element_type=Float32).write(elements)`	`float32_array_utils::write_array(stream, data, n)`
`ArrayTransferIFSlave(element_type=Float32).get(count=n)`	`float32_array_utils::read_array(stream, n)`

gen_array_utils works for any DataSchema subclass, so an ArrayTransferIF parameterized on a composite DataList generates analogous struct-array utilities.

Wire footprint

`element_type`	Words per element	Words per `count`-element transfer
`Float32` (32-bit, word_bw=32)	1	`count`
`U8` (8-bit, word_bw=32)	1	`count`
`S16` (16-bit, word_bw=32)	1	`count`
`SomeDataList`	`SomeDataList.nwords_per_inst(word_bw)`	`count × nwords_per_inst`

Elements are packed tightly; no padding is inserted between elements.

Quick reference

from pysilicon.hw.schema_transfer_interface import (
    ArrayTransferIFMaster,
    ArrayTransferIFSlave,
    ArrayTransferIF,
    StreamTransport,
)

Operation	Code
Create transport	`StreamTransport(master_ep=m, slave_ep=s)`
Create master	`ArrayTransferIFMaster(sim=sim, transport=t, element_type=T, bitwidth=32)`
Create slave (push)	`ArrayTransferIFSlave(sim=sim, transport=t, element_type=T, bitwidth=32, rx_proc=fn)`
Create slave (pull)	`ArrayTransferIFSlave(sim=sim, transport=t, element_type=T, bitwidth=32, pull_mode=True)`
Transmit (numpy fast path)	`yield from master.write(np.array(..., dtype=np.float32))`
Transmit (schema / raw values)	`yield from master.write([v1, v2, …])`
Receive (push, manual setup)	`slave.pre_sim()` before `env.run()`
Receive (pull)	`arr = yield from slave.get(count=n)` — `np.ndarray` for scalar types
Words per element	`element_type.nwords_per_inst(bitwidth)`

See also: Schema Transfer Interface for the shared PhysicalTransport abstraction.