Using the neural network accelerator#

The FPGA and MCU are connected vai SPI and four GPIO pins:

MCU

FPGA

PIO13

GPIO0

PIO14

GPIO1

PIO15

GPIO2

PIO20

GPIO3

To expose the network accelerator to an application, running on an MCU, we use a middleware. An application can call a small set of c functions (stub), to interact with a given HWFunction. This stub is specific to its corresponding HWFunction. In our case the HWFunction corresponds to the neural network accelerator.

        flowchart LR


subgraph FPGA
    direction TB
    HWFunction
    MWFPGA[Middleware]
    Skeleton
    MWFPGA --> Skeleton
    Skeleton --> HWFunction
end


subgraph MCU
    direction TB
    MWMCU[Middleware]
    Stub
    App
    Stub --> MWMCU
    App --> Stub
end

MWMCU --> MWFPGA


    
  • App: user supplied application, calling stub to access neural network accelerator (HWFunction)

  • Stub:

    • passes data to the HWFunction and starts the computation

    • returns results of computation

    • allows App to check if HWFunction is loaded or not

    • allows App to load HWFunction

  • Middleware:

    • load bitfiles from specific addresses in flash memory

    • disable/enable skeleton, ie. its corresponding HWFunction

    • pass memory mapped io through to skeleton

  • Skeleton:

    • counterpart to stub

    • specific to HWFunction

  • HWFunction:

    • in general an arbitrary function we want to execute on the FPGA

    • here: the neural network accelerator

Middleware Memory Mapped IO via SPI#

We transmit data via SPI in the following format to interact with the FPGA. The FPGA :

  • First two byte determines the message type, transmitted high byte first.

  • the rest of the transferred data is the payload

  • c code example of read/write:

    • uint16_t write_command = command | 0x8000;

    • uint16_t read_command = command & 0x7FFF;

  • The start of a message is marked by pulling down the SPI slave select line, the end is marked by pulling the slave select line up

bit 15

bits 14-0

r=0/w=1

message type

Message Types: 0x00 - 0xFF:

  • LED: 0x03 (1 byte)

    • each of the lowest four bits control one of the LEDs

    • 0=off, 1=on

    • eg., command to turn on first led: char command[] = {0x80, 0x03, 0x01}; for (i=0; i < 3; i++) {send_byte(command[i]);}

  • USERLOGIC_CONTROL: 0x04 (1 byte)

    • sets the reset pin of the skeleton

  • Multiboot: 0x05-0x07 (3 bytes)

    • start address of the configuration to load from flash

    • triggers reconfiguration after write to 0x07 is complete

    • always write all three bytes

    • starting with the lowest byte of the address to 0x05

    • example command: {0x80, 0x05, 0xAA, 0xAA, 0xAA}, where the 0xAA bytes specify where in the flash memory the configuration is we want to load

  • other message types (0x08-0xFF) are reserved for future uses

User Logic Region: 0x100 - ??

  • passed through to skeleton

  • the offset 0x100 is transparent to stub and skeleton

Skeleton v1#

The supported address range for the neural network skeleton ranges from 0 to 99. The skeleton we use for neural networks uses its memory mapped io as follows:

mode

address (bytewise)

value (byte)

meaning

write

100

0x01

start computation

write

100

0x00

stop computation

write

0 to 99

arbitrary

write up to 100 bytes of input data

read

0 to 99

result

read up to 100 bytes of computation result

read

2000

id

id of the loaded hw function

The byte for triggering computation start/stop is written to the address directly after the end of the input data.

The skeleton provides a busy and a done signal that tell whether computation is still running or finished. The FPGA GPIO2 is connected to busy, the MCU can read that line to find out if computation has finished.

Skeleton v2#

The supported address range for the neural network skeleton ranges from 18 to 20000. The control register is from address 16-17.

The skeleton we use for neural networks uses its memory mapped io as follows:

mode

address (bytewise)

value (byte)

meaning

write

16

0b XXXX XXX1

start computation

write

16

0b XXXX XXX0

stop computation

write

17

0b XXXX XXXX

Reserved for Control Register

write

18 to 20000

arbitrary

write up to 19983 bytes of input data

read

18 to 20000

result

read up to 19983 bytes of computation result

read

0 to 15

id

id of the loaded hw function