Overview#
SPECK™ are a family of spiking neural network ASICs, which is designed to focusing on convolution spiking neural network (CSNN) based vision processing tasks. Speck™ is the world first neuromorphic device which integrates the DYNAP-CNN neuromorphic processor and a dynamic vision sensor (DVS) into a single SoC. Speck™ processor, DYNAP-CNN, is a fully scalable, event-driven neuromorphic processor with up to 0.32M configurable spiking neurons and direct interface with external DVS.
DYNAP-CNN processor can be used directly bypassing the DVS camera integrated on Speck.
Currently, sinabs-dynapcnn library provides the interface to serveral available versions of hardwares for SPECK™ family:
Device Name |
Identifier |
---|---|
Speck 2E Test Board |
*Speck2eTestBoard |
Speck 2E Development Kit |
*speck2edevkit |
Speck 2F Module |
*speck2fmodule |
Speck 2F Development Kit |
*speck2fdevkit |
DYNAP-SE2 DevBoard |
*dynapse2 |
DYNAP-SE2 Stack |
*dynapse2_stack |
note: * stands for the correspond software identifier we assigned to the different devkit
The general top level diagram of the SPECK™ chip is shown as follows:
Sinabs allows you to interact with the DynapCNN processor using external DVS sensors or pre-defined event data, besides its embeded internal DVS for development. Currently, Speck supports the following external DVS sensor with the samna support:
Inivation Davis346
Inivation Davis240
Inivation DVXplorer
Prophesee EVK3-Gen 3.1VGA
Backend: Dynapcnn#
To interact with the processor, sinabs needs samna dependency to enables the chip configuration and network setting. As is shown in the figure below, dynapcnn backend provides a simple way to convert the network structure and parameters to the SamnaConfiguration that can be used by samna to setup the chip.
Chip Resources#
All SPECK™ family chips use the DYNAP asynchronous computing structure and have the similar computation resources. The detailed features are shown below
Key Features#
Async input interface: Speck devkit supports configure both from the internal dvs sensor or external dvs resources
Async output interface: 1: monitor bus output 2:readout layer output
Interrupt: 4 ouput pins encodes max 15 + 1(no class) output
128x128 DVS array with every pixel can be configured to be killed
1x Event(For DVS) pre-processing layer
9x DYNAP-CNN Layer
1x Readout layer that can does the prediction based on DYNAP-CNN layer output
Event Pre-processing Layer#
The event pre-processing layer receives the input events coming from dvs or an externa source. Depending on the configuration, the layer can perform following operations based on events:
Merge/Select the polarity of the input event stream
Sum Pool with kerlnel size of 1x1, 2x2, 4x4
ROI selection
Mirroring the event stream
Rotate 90 degrees
An Event noise filter is also included in the pre-processing layer, the filter can be enabled upon users setting. A general pre-processing pipeline for the preprocessing pipe line is shown as follow:
DYNAP-CNN Layer#
The DYNAP-CNN Layer is the main hardware representation of the designed spiking neural network structure. One of the main goal of the sinabs-dynapcnn is to provide an efficient, simply way to convert the torch.Sequential
object to equivalent DYNAP-CNN layer configuration. For more details on how to use the sinabs-dynapcnn to interact with your designed SNNs, please check our tutorials.
Feature#
Max input dimension for the layer: 128x128
Max output feature map size: 64x64
Max channel number: 1024
Weight resolution 8bit
Neuron State resolution 16bit
Max convolutional kernel size 16x16
Stride step:1,2,4,8
Padding: [0,1,2,3,4,5,6,7]
Sumpooling: 1x1, 2x2, 4x4
Fanout:2
Internal Execution Order#
A single chip consists of 9 configurable computing cores (layers), each layer can be regarded as a combination of (Conv2d Operation –> Spiking Activation –> Sumpooling). These computation has to be configured in exact equivalent execution order. Each layer can be flexiblely configured to be communicate with other layers.
Async event-driven feature#
information communication with layers are only in “event based” format, the layer process the “incomming” event only at whenever a layer recieves it. Each layer can be configured to set 1-2 destination to the other layer.
Memory Constraints and Network Sizing#
Each layer has different memory constraints that split into kernel memory (for weight parameters) and neuron memory (spiking neuron states) as is shown below. Note: For the entire series of chips, speck support precision of 8bit int for kernel parameters and 16bit int for neuron state precisions.
with a convolutional layer is defined as
\(c\), stands for input channel number
\(f\), output channel number
\(k_x\) and \(k_y\) kernel size
The theoretial number of entries required for kernel memory \(K_M\) is then:
\(K_M = c \times f \times k_{x} \times k_{y}\)
The actual number of entries required in chip because of the address encoding scheme, the actual total memory requires \(K_{MT}\) is then:
\(K_{MT}=c \cdot 2^{log_{2}^{k_{x}k_{y}} + log_{2}^{f}}\)
The required neuron memory entries depends on the output feature map where define the input feature map size \(c_{x}\), \(c_{y}\), stride \(s_{x}\), \(s_{y}\), padding \(p_{x}\), \(p_{y}\).
\(f_x = \frac{c_{x}-k_{x}+2p_{x}}{s_{x}} + 1\)
\(f_y = \frac{c_{y}-k_{y}+2p_{y}}{s_{y}} + 1\)
The actual neuron memory entries \(N_{M}\) is then defined as:
\(N_{M} = f \times f_{x} \times f_{y}\)
Taking an example of convolutional layer
conv_layer = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3,3), stride=(1,1), padding=(1,1))
Assume the input dimension of 64x64 we could obtain the output feature map size as:
\(f_x = \frac{64-3+2 \times 1}{1} + 1 = 64\)
\(f_y = \frac{64-3+2 \times 1}{1} + 1 = 64\)
The actual kernel memory entries is calculated thus:
\(K_{MT}=16 \times 32 \times 4 \times 4 = 8Ki\)
The actual Neuron memory entries is then:
\(N_{M} = 64 \times 64 \times 32 = 128Ki\)
Where 128Ki neuron exceeds any available neuron memory contrains among 9 layers, thus this layer CANNOT be deployed on the chip
Leak operation#
Each layer includes a leak generation block, which will update all the configured neuron states with provided leak values with a clock reference signal. Checkout the tutorial on how to use leak feature.
Congestion banlancer#
In latest devlopment kit(speck2e), each dynapcnn layer is equipped with a congestion banlancer upon its data path input. It is able to drops the incoming events at any time when the convolutional cores is overloaded in processing.
Decimator#
In latest development kit(speck2e), each dynapcnn layer is equipped with a decimator block at its output data path. The decimator enables the user reduce the spike rate at the output of a convolution layer. When enabled, the decimator allows 1 spikes to be passed every N output spikes(N=[2,4,8,16,32,128,256,512]).
Readout Layer#
The readout layer provides output data via the output serial interface. The time window where the readout window used is driven by an internal slow clock. The readout layer can be configured to have 1,16,32 times the provided clock cycle. 4 different addressing modes can be selected to assign input spikes to the readout layer to 15 available output classes.
Internal Slow clock#
only avilable in latest speck2e, a slow clock internally is used to support a number of time-cycle based features.
Leak clock, the DYNAPCNN layers including a leak operation that can operate on all the configured neuron states based on the clock setting.
DVS pre-processing filter: The DVS filter uses the slow clock as the time reference to update its internal states
Readout Layer: the readout layer uses the slow clock cycle as the moving average step to calculating the moving averages.
Note: The slow clock is internally generated by dividing the internal DVS raw event rates, which not always gurantee to be accurate.