Always-On Speech Recognition Using TrueNorth, a Reconfigurable, Neurosynaptic Processor

Deep neural networks (DNN) have been shown to be very effective at solving challenging problems in several areas of computing, including vision, speech, and natural language processing. However, traditional platforms for implementing these DNNs are often very power hungry, which has lead to significant efforts in the development of configurable platforms capable of implementing these DNNs efficiently. One of these platforms, the IBM TrueNorth processor, has demonstrated very low operating power in performing visual computing and neural network classification tasks in real-time. The neuron computation, synaptic memory, and communication fabrics are all configurable, so that a wide range of network types and topologies can be mapped to TrueNorth. This reconfigurability translates into the capability to support a wide range of low-power functions in addition to feed-forward DNN classifiers, including for example, the audio processing functions presented here.In this work, we propose an end-to-end audio processing pipeline that is implemented entirely on a TrueNorth processor and designed to specifically leverage the highly-parallel, low-precision computing primitives TrueNorth offers. As part of this pipeline, we develop an audio feature extractor (LATTE) designed for implementation on TrueNorth, and explore the tradeoffs among several design variants in terms of accuracy, power, and performance. We customize the energy-efficient deep neuromorphic networks structures that our design utilizes as the classifier and show how classifier parameters can trade between power and accuracy. In addition to enabling a wide range of diverse functions, the reconfigurability of TrueNorth enables re-training and re-programming the system to satisfy varying energy, speed, area, and accuracy requirements. The resulting system's end-to-end power consumption can be as low as <inline-formula><tex-math notation="LaTeX"> $14.43\text{mW}$</tex-math><alternatives><inline-graphic xlink:href="tsai-ieq1-2630683.gif"/></alternatives> </inline-formula>, which would give up to 100 hours of continuous usage with button cell batteries (CR3023 <inline-formula><tex-math notation="LaTeX">$1.5\; \text{Whr}$</tex-math><alternatives> <inline-graphic xlink:href="tsai-ieq2-2630683.gif"/></alternatives></inline-formula>) or 450 hours with cellphone batteries (iPhone 6s <inline-formula><tex-math notation="LaTeX">$6.55\; \text{Whr}$</tex-math><alternatives> <inline-graphic xlink:href="tsai-ieq3-2630683.gif"/></alternatives></inline-formula>).

[1]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[2]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[3]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[4]  Mark A. Fanty,et al.  Rapid unsupervised adaptation to children's speech on a connected-digit task , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  K. Sen,et al.  Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[6]  J. Bradbury,et al.  Linear Predictive Coding , 2000 .

[7]  A. Aertsen,et al.  The Spectro-Temporal Receptive Field , 1981, Biological Cybernetics.

[8]  Michael S. Lewicki,et al.  Efficient Coding of Time-Relative Structure Using Spikes , 2005, Neural Computation.

[9]  Shih-Chii Liu,et al.  AER EAR: A Matched Silicon Cochlea Pair With Address Event Representation Interface , 2007, IEEE Trans. Circuits Syst. I Regul. Pap..

[10]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[11]  Johannes Schemmel,et al.  A wafer-scale neuromorphic hardware system for large-scale neural modeling , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[12]  Shih-Chii Liu,et al.  Neuromorphic sensory systems , 2010, Current Opinion in Neurobiology.

[13]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[15]  Johannes Schemmel,et al.  Live demonstration: A scaled-down version of the BrainScaleS wafer-scale neuromorphic system , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[16]  Andrew S. Cassidy,et al.  Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[17]  Jim D. Garside,et al.  Overview of the SpiNNaker System Architecture , 2013, IEEE Transactions on Computers.

[18]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[19]  Andrew S. Cassidy,et al.  Cognitive computing programming paradigm: A Corelet Language for composing networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[20]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[21]  Denis Jouvet,et al.  Investigating stranded GMM for improving automatic speech recognition , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[22]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Andrew S. Cassidy,et al.  Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Francesco Piazza,et al.  Power Normalized Cepstral Coefficients based supervectors and i-vectors for small vocabulary speech recognition , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[27]  Dharmendra S. Modha,et al.  Backpropagation for Energy-Efficient Neuromorphic Computing , 2015, NIPS.

[28]  Lei Zhang,et al.  Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[30]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31]  Andrew S. Cassidy,et al.  Visual saliency on networks of neurosynaptic cores , 2015, IBM J. Res. Dev..

[32]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  GPU-Based Deep Learning Inference: A Performance and Power Analysis , 2015 .

[35]  Narayanan Vijaykrishnan,et al.  LATTE: Low-power Audio Transform with TrueNorth Ecosystem , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[36]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[37]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).