Evaluation Methodology

Python Model

A bit-exact Python model for the accelerator is defined in hwdesign/demos/conv2d/conv2d_demo.py. The same file serves three roles:

It defines the command, response, debug, and enum types with the PySilicon DataList and EnumField schema system.
It implements a software reference model in the Conv2DAccel class.
It provides the Conv2DTest driver that generates random data, prepares Vitis inputs, launches the HLS flow, and compares hardware outputs against the reference model.

The message schemas used by both the Python model and the C++ kernel are:

Conv2DCmd: image dimensions, kernel size, and the memory addresses of the input image, output image, and kernel
Conv2DResp: a single error code reporting whether the request completed successfully
Conv2DDebug: per-stage instrumentation containing a row index and an event code

This schema-first approach is important for evaluation because the generated C++ include files and the Python serializer/deserializer are derived from the same definitions. That keeps the software model, the C++ testbench, and the HLS kernel aligned on the exact bit layout of every message.

The reference computation itself is implemented with scipy.signal.correlate2d(..., mode="same", boundary="fill", fillvalue=0). This matches the hardware kernel’s zero-padded same-size 2D correlation behavior. After correlation, the Python model applies the same fixed-point post-processing as the HLS implementation:

\[y[m,n] = \mathrm{sat}_{[0,255]}\left( \frac{\sum_{i,j} k[i,j] x[m+i,n+j]}{2^{7}} \right)\]

where the division by $2^7$ is implemented as an arithmetic right shift because the signed 8-bit kernel coefficients use kernel_fbits = 7 fractional bits. The result is then saturated into the unsigned 8-bit pixel range [0, 255].

Within Conv2DAccel.compute_conv2d(), the Python model also mirrors the hardware-side argument checks:

nrows must be between 1 and MAX_NROW = 512
ncols must be between 1 and MAX_NCOL = 512
kernel_size must be between 1 and MAX_KERNEL_SIZE = 4
invalid memory accesses raise an address error

As a result, the Python model is not just an approximate algorithmic model. It is the functional golden reference for both output data and error behavior.

Testbench

The Vitis C++ testbench is implemented in hwdesign/demos/conv2d/conv2d_tb.cpp. It is a file-based testbench that exchanges data with the Python harness through the hwdesign/demos/conv2d/data directory.

The testbench runs as follows:

It reads params.json to obtain nrows, ncols, and kernel_size.
It reads the binary input image and kernel files produced by the Python harness.
It allocates regions in a simulated external memory and writes the packed image and kernel data into that memory.
It constructs a Conv2DCmd message with the image dimensions and memory addresses.
It sends the command on the AXI4-Stream input, invokes the kernel, and waits for the AXI4-Stream response.
It drains the debug stream, decodes the emitted Conv2DDebug messages, and records them.
It reads the output image back from memory and writes all results to disk for post-processing by Python.

The main output artifacts produced by the testbench are:

resp_data.bin: serialized Conv2DResp
im_out_array.bin: output image written by the kernel
debug_data.bin: raw debug event stream
sync_status.json: summary of termination status, last debug event, and event count

The sync_status.json file is especially useful for evaluation because it confirms that:

the output response terminated with TLAST
each debug event terminated correctly on the AXI stream
the final event and row index are sensible

This catches interface-level errors that would not necessarily appear as pixel mismatches alone.

Correctness Checks

Correctness is established by comparing the HLS testbench outputs against the Python golden model generated by Conv2DTest.simulate().

The evaluation checks two things:

Response correctness: the Conv2DResp read from resp_data.bin must match the error code produced by the Python model
Output-image correctness: the contents of im_out_array.bin must exactly match the Python reference image element-by-element

These checks are performed by Conv2DTest.read_vitis_outputs(). If the response or output image differs from the reference model, the script raises a runtime error and prints both the observed and expected results.

Because the same Python harness generates the inputs, computes the expected result, and verifies the Vitis outputs, the full correctness loop is reproducible and automatable. The test is therefore stronger than a manual visual inspection of images.

Performance and Timing Evaluation

The repository also includes timing-oriented evaluation infrastructure.

During kernel execution, the accelerator emits debug events for:

MAIN_START and MAIN_END
LOAD_START and LOAD_END
COMPUTE_START and COMPUTE_END
STORE_START and STORE_END

These events are written once per major phase and once per processed row, allowing the user to reconstruct the execution timeline.

For RTL-level timing analysis, the flow can generate a VCD waveform dump. The helper script hwdesign/demos/conv2d/timing_analysis.py parses the VCD, extracts the input, output, and debug AXI4-Stream bursts, and decodes them back into typed Python objects. The notebook hwdesign/demos/conv2d/view_timing.ipynb then uses these decoded events to measure the duration of load, compute, and store phases across rows.

This provides a simple way to evaluate:

end-to-end command-to-response timing
the relative cost of memory load, compute, and store stages
whether the row-processing schedule behaves as expected
whether the pipelined inner loop is dominating the execution time, as intended

For synthesis-level metrics, Conv2DTest.write_csynth_reports() parses the Vitis HLS csynth.xml report and writes summarized loop and resource tables into the data directory:

csynth_loop_info.csv
csynth_loop_info.json
csynth_resources.csv

These files can be used to inspect loop pipelining, achieved initiation interval, and estimated resource usage without manually opening the Vitis GUI.

Reproducing the Evaluation

The standard way to run the evaluation flow is from the hwdesign/demos/conv2d directory.

To run the Python model only:

python conv2d_demo.py --skip_vitis

To run C simulation and synthesis:

python conv2d_demo.py --through csynth

To run through co-simulation and generate a VCD for timing analysis:

python conv2d_demo.py --through generate_vcd --trace_level port

The most important evaluation outputs are then available under hwdesign/demos/conv2d/data and hwdesign/demos/conv2d/vcd.

Summary

The evaluation methodology combines three complementary checks:

a bit-exact Python golden model for functional correctness
a file-based C++ testbench that verifies the HLS kernel and its AXI interfaces
timing and synthesis-report post-processing for performance analysis

Together, these provide confidence that the Conv2D accelerator is correct at the algorithmic level, consistent at the message/interface level, and measurable at the implementation level.