Consider the following SystemVerilog code that computes an output y based on inputs x1 and x2 in a single cycle:
always_ff @(posedge clk) begin
xreg1 <= x1;
xreg2 <= x2;
end
always_comb begin
z1 = f1(xreg1, xreg2); // delay = 3.8 ns
z2 = f2(xreg1, xreg2); // delay = 2.1 ns
z3 = f3(xreg1, z2); // delay = 1.5 ns
y = f4(z1, z3); // delay = 1.2 ns
end
z1, z2, xreg1 to achieve a two-cycle design.Consider the following SystemVerilog pipeline:
always_ff @(posedge clk) begin
x_s0 <= x;
z1_s1 <= 10*x_s0;
if (z1_s1 > 100) begin
z2_s2 <= 150 - (z1_s1>>>1);
end else begin
z2_s2 <= z1_s1;;
end
end
always_comb begin
y = (3*z2_s2>>>2);
end
which has input x and output y. Suppose the inputs x
on cycles 0, 1, and 2 are
\[
3, 6, 12
\]
and are unknown (?) on later cycles.
Write the values of the signals x_s0, z1_s1, z2_s2, and y for cycles 0 through 5. Assume that all registers are initialized to ? at cycle 0 (unknown).
Given an input x_0, we wish to create a hardware module that computes the maximum
We are given a SystemVerilog combinational function that implements f
z = f(x);
Write a pipelined SystemVerilog code for this module with II=1. On the first stage, only register the input -- do not do any further computation. You will need to then insert registers between stages, but each stage can have at most one instance of the function f.
The final output should be combinational.
Consider the following two loops. Each loop contains a loop-carried dependency that affects the minimum possible initiation interval (II) when pipelined in Vitis HLS.
sum = 0;
ind = 0;
for (int i = 0; i < 100; i++) {
sum += x[ind];
ind = ind_next[ind];
}
sum = 0;
ind = 0;
for (int i = 0; i < 100; i++) {
sum += x[ind];
if (sum > 0)
ind = ind_next_pos[ind];
else
ind = ind_next_neg[ind];
}
Assume the following operation latencies:
if selection) take 1 cycle.For each loop, one way to compute the minimum initiation interval (II) is to follow these steps:
xi = x[ind] corresponding to loading the values from the array or sum += xi corresponding to the addition operation.Stage 0: ...
Stage 1: ...
Stage 2: .... For example, in Loop A, one possible assignment is:Stage 0: xi = x[ind];, Stage 1: sum += xi;, ...
Using this procedure, determine the minimum II for:
Consider the following loop, where x is stored in a single-port memory (only one read per cycle is allowed):
int x[3*n], d[n];
for (int i = 0; i < n; i++) {
d[i] = x[3*i] * x[3*i] +
x[3*i+1] * x[3*i+1] +
x[3*i+2] * x[3*i+2];
}
Each loop iteration requires three reads from x: indices 3*i, 3*i+1, and 3*i+2.
Assume:
x may occur per cycle.(a) What is the minimum possible initiation interval (II) for this loop?
(b) Now suppose the data is stored in a packed format using a 96‑bit word:
ap_uint<96> xpacked[n];
Each 96‑bit element contains the three 32‑bit values previously stored at x[3*i], x[3*i+1], and x[3*i+2].
Rewrite the loop using xpacked and determine the new minimum II, assuming the same single-port memory. You can use the syntax p.range(31, 0) to extract the lower 32 bits of a 96‑bit variable p, and similar code to extract the other 32-bit variables.
Consider the following 4-tap moving average computation:
for (i = 0; i < ny; i++) {
yi = 0;
for (j = 0; j < 4; j++) {
yi += x[i+j];
}
y[i] = (yi >> 2);
}
Assume that x is stored in a single-port memory, so only one read from x may occur per cycle.
(a) What is the minimum possible initiation interval (II) of the outer loop?
(b) Suppose the inner loop is fully unrolled using #pragma HLS UNROLL.
Does this change the minimum II? Explain why or why not.
(c) Now consider using a delay line to reduce the number of memory reads per iteration:
int xdly[4];
#pragma HLS ARRAY_PARTITION variable=xdly complete
Rewrite the loop using the delay line so that the outer loop can achieve
II = 1.
Your rewritten loop should read only one new value from x per iteration.
Consider the following declaration:
float X[512][4];
#pragma HLS ARRAY_PARTITION variable=X dim=2 factor=4 cyclic
This pragma partitions the second dimension (dim=2) into 4 separate memory banks.
Thus, each column of X is stored in a different bank.
This means that values X[i0][0], X[i1][1], X[i2][2], and X[i3][3]
may all be read in the same clock cycle, because they come from different banks.
A Vitis HLS kernel contains a loop that may be unrolled for performance:
float x[1024], y[1024];
for (int i = 0; i < 1024; i++) {
#pragma HLS UNROLL factor=UF
y[i] = g(x[i]);
}
To support unrolling by a factor UF, the arrays must be partitioned into
UF independent memory banks. Assume Vitis maps each bank into separate BRAM
blocks with the following constraints:
x and y must not share BRAM blocks.
How many BRAM18 blocks are required to store the arrays x and y
for each of the following unroll factors:
UF = 1UF = 2UF = 4