Striding Shift Register#
Translation Stage: low level ir to vhdl
A VHDL implementation of a shift register with configurable stride, useful for implementing strided operations in hardware designs, particularly for neural network architectures.
Features#
Configurable data width
Configurable number of points
Adjustable stride length
Reset capability (async)
Valid input/output signaling
This component is particularly useful in neural network hardware implementations where strided operations are common, such as:
Strided convolutions
Pooling layers
Subsampling operations
Parameters#
DATA_WIDTH
: Width of each data pointNUM_POINTS
: Number of data points to storeSTRIDE
: Number of clock cycles to wait between shifts (default: 1)
Operation#
The striding shift register extends a basic shift register by adding a stride parameter. When stride > 1, the register only accepts new input every N clock cycles (where N is the stride value). This is particularly useful for implementing strided convolutions or pooling operations in hardware.
Use case example#
The exact intended purpose of the component might not be obvious at first. Let’s assume we want to implement a neural network where to consecutive filters \(f_1\) and \(f_2\) have attributes as follows:
\(f_1\)
output width: 2 bits (e.g., one per channel)
stride: 2
\(f_2\)
kernel size: 6 bits (e.g., three per channel)
Let’s assume further that we have a hardware component for \(f_1\) that will process one step of our input signal, per clock cycle. To provide enough data for \(f_2\) we have to provide a buffer with 6 bits. As \(f_1\) has stride 2 we want \(f_2\) to see only the results of every second step of \(f_1\), i.e., we want to ignore every second clock cycle.
Note
One would think that this leads to a lot of superfluous idle cylces where \(f_2\) spends its time waiting for \(f_1\). However, \(f_1\) will also have to wait for \(f_0\). Thus, the whole network can at most be as fast as our first layer.
We can visualize the process with the following waveform:
DATA_WIDTH => 2
NUM_POINTS => 3
STRIDE => 2
Input data is provided every clock cycle (valid_in = 1)
Due to stride = 2, the register only shifts on every second clock cycle
The output becomes valid once the register is filled
Each output represents 3 consecutive accepted inputs concatenated together, each with a width of 2 bit