Fixed and Floating Point Numbers Questions

Question 1. Custom FP decoding

Consider a simple floating‑point representation \( x = A \, (-1)^s \, c \, 2^e \), where the scale factor is \(A = 2^{-3}\). The fields are:

The bits are packed with the sign bit first, followed by the exponent bits, then the mantissa bits. What decimal values are represented by the following bit patterns?

  1. 0 000 10000
  2. 1 011 11000
  3. 0 101 00101

Question 2. Custom FP encoding

Consider a custom floating-point format \[ \hat{x} = (-1)^s \left(1 + \frac{c}{2^p}\right) 2^{e - 4}, \] where:

For each real number \(x\) below, choose \(s\), \(c\), and \(e\) so that \(\hat{x}\) is as close as possible to \(x\). Give your final answers as \((s, c, e)\) and the corresponding \(\hat{x}\) in decimal.

  1. \(x = 3.0\)
  2. \(x = 0.5\)
  3. \(x = -5.0\)
  4. \(x = 1.3\)

Question 3. FixP linear approximation

You wish to approximate the floating‑point equation \[ y = a x + b \] using Qm.n fixed‑point arithmetic.

Let \(a = 0.3125\), \(b = -1.75\), and choose the format Q3.4 (3 integer bits including sign, 4 fractional bits).

  1. Convert \(a\) and \(b\) into their Q3.4 integer representations.
  2. Write SystemVerilog code that computes an approximation of \(y\) using only Q3.4 arithmetic.

Question 4. FixedP mult with saturation

Write SystemVerilog code to implement the computation \[ y = a \cdot b \] where all variables (a, b, and y) are represented in Q5.4 format. Instead of truncating the result of the multiplication, perform saturation as follows: