Feature Representations for Neuromorphic Audio Spike Streams

Event-driven neuromorphic spiking sensors such as the silicon retina and the silicon cochlea encode the external sensory stimuli as asynchronous streams of spikes across different channels or pixels. Combining state-of-art deep neural networks with the asynchronous outputs of these sensors has produced encouraging results on some datasets but remains challenging. While the lack of effective spiking networks to process the spike streams is one reason, the other reason is that the pre-processing methods required to convert the spike streams to frame-based features needed for the deep networks still require further investigation. This work investigates the effectiveness of synchronous and asynchronous frame-based features generated using spike count and constant event binning in combination with the use of a recurrent neural network for solving a classification task using N-TIDIGITS18 dataset. This spike-based dataset consists of recordings from the Dynamic Audio Sensor, a spiking silicon cochlea sensor, in response to the TIDIGITS audio dataset. We also propose a new pre-processing method which applies an exponential kernel on the output cochlea spikes so that the interspike timing information is better preserved. The results from the N-TIDIGITS18 dataset show that the exponential features perform better than the spike count features, with over 91% accuracy on the digit classification task. This accuracy corresponds to an improvement of at least 2.5% over the use of spike count features, establishing a new state of the art for this dataset.

[1]  Matthew Cook,et al.  Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Gregory Cohen,et al.  Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[4]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[5]  Wulfram Gerstner,et al.  Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. , 2005, Journal of neurophysiology.

[6]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[7]  Tobi Delbruck,et al.  A 240×180 10mW 12us latency sparse-output vision sensor for mobile applications , 2013, 2013 Symposium on VLSI Circuits.

[8]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Tobi Delbrück,et al.  Live demonstration: Event-driven real-time spoken digit recognition system , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[11]  Tobi Delbrück,et al.  A 0.5V 55μW 64×2-channel binaural silicon cochlea for event-driven stereo-audio sensing , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[12]  Shih-Chii Liu,et al.  AER EAR: A Matched Silicon Cochlea Pair With Address Event Representation Interface , 2007, IEEE Trans. Circuits Syst. I Regul. Pap..

[13]  Shih-Chii Liu,et al.  Reconstruction of audio waveforms from spike trains of artificial cochlea models , 2015, Front. Neurosci..

[14]  Tobias Delbrück,et al.  Frame-free dynamic digital vision , 2008 .

[15]  Shih-Chii Liu,et al.  Speaker-independent isolated digit recognition using an AER silicon cochlea , 2011, 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS).

[16]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[17]  Shih-Chii Liu,et al.  Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification , 2017, Front. Neurosci..

[18]  Tobi Delbruck,et al.  Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[19]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Tobi Delbrück,et al.  Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking Output , 2014, Proceedings of the IEEE.

[22]  Colin Raffel,et al.  Lasagne: First release. , 2015 .

[23]  Patrice Simardy,et al.  Learning Long-Term Dependencies with , 2007 .

[24]  Bernabe Linares-Barranco,et al.  Comparison between Frame-Constrained Fix-Pixel-Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual Processing , 2012, Front. Neurosci..

[25]  Hynek Hermansky,et al.  The use of spike-based representations for hardware audition systems , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[26]  Yiannis Aloimonos,et al.  A Dataset for Visual Navigation with Neuromorphic Methods , 2016, Front. Neurosci..

[27]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Steve B. Furber,et al.  Robustness of spiking Deep Belief Networks to noise and reduced bit precision of neuro-inspired hardware platforms , 2015, Front. Neurosci..

[29]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[30]  Tobi Delbruck,et al.  A Dynamic Vision Sensor With 1% Temporal Contrast Sensitivity and In-Pixel Asynchronous Delta Modulator for Event Encoding , 2015, IEEE Journal of Solid-State Circuits.

[31]  Garrick Orchard,et al.  Skimming Digits: Neuromorphic Classification of Spike-Encoded Images , 2016, Front. Neurosci..

[32]  Tobi Delbrück,et al.  22.5 A 0.5V 55µW 64×2-channel binaural silicon cochlea for event-driven stereo-audio sensing , 2016, ISSCC.

[33]  Tobi Delbrück,et al.  Real-time speaker identification using the AEREAR2 event-based silicon cochlea , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[34]  Bernabé Linares-Barranco,et al.  Feedforward Categorization on AER Motion Events Using Cortex-Like Features in a Spiking Neural Network , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[36]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[37]  Tobi Delbrück,et al.  Live demonstration: Convolutional neural network driven by dynamic vision sensor playing RoShamBo , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[38]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[39]  Gregory Cohen,et al.  Synthesis of neural networks for spatio-temporal spike pattern recognition and processing , 2013, Front. Neurosci..

[40]  Tobi Delbrück,et al.  Steering a predator robot using a mixed frame/event-driven convolutional neural network , 2016, 2016 Second International Conference on Event-based Control, Communication, and Signal Processing (EBCCSP).

[41]  Shantanu Chakrabartty,et al.  Exploiting spike-based dynamics in a silicon cochlea for speaker identification , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[42]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[43]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[44]  Bernabé Linares-Barranco,et al.  Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate Coding and Coincidence Processing--Application to Feedforward ConvNets , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  André van Schaik,et al.  AER EAR: A Matched Silicon Cochlea Pair With Address Event Representation Interface , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[46]  A. Szűcs,et al.  Applications of the spike density function in analysis of neuronal firing patterns , 1998, Journal of Neuroscience Methods.

[47]  Bernabé Linares-Barranco,et al.  Poker-DVS and MNIST-DVS. Their History, How They Were Made, and Other Details , 2015, Front. Neurosci..

[48]  Shih-Chii Liu,et al.  Effective sensor fusion with event-based sensors and deep network architectures , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[49]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[50]  Sio-Hoi Ieng,et al.  Spatiotemporal features for asynchronous event-based data , 2015, Front. Neurosci..

[51]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[52]  Tobi Delbrück,et al.  Asynchronous Binaural Spatial Audition Sensor With 2 × 64 × 4 Channel Output , 2014, IEEE Trans. Biomed. Circuits Syst..

[53]  Garrick Orchard,et al.  HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.