Bio-inspired Multi-layer Spiking Neural Network Extracts Discriminative Features from Speech Signals

Spiking neural networks (SNNs) enable power-efficient implementations due to their sparse, spike-based coding scheme. This paper develops a bio-inspired SNN that uses unsupervised learning to extract discriminative features from speech signals, which can subsequently be used in a classifier. The architecture consists of a spiking convolutional/pooling layer followed by a fully connected spiking layer for feature discovery. The convolutional layer of leaky, integrate-and-fire (LIF) neurons represents primary acoustic features. The fully connected layer is equipped with a probabilistic spike-timing-dependent plasticity learning rule. This layer represents the discriminative features through probabilistic, LIF neurons. To assess the discriminative power of the learned features, they are used in a hidden Markov model (HMM) for spoken digit recognition. The experimental results show performance above 96% that compares favorably with popular statistical feature extraction methods. Our results provide a novel demonstration of unsupervised feature acquisition in an SNN.

[1]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[2]  Deepak Khosla,et al.  Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition , 2014, International Journal of Computer Vision.

[3]  Anthony S. Maida,et al.  A spiking network that learns to extract spike signatures from speech signals , 2016, Neurocomputing.

[4]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[5]  Shih-Chii Liu,et al.  Effective sensor fusion with event-based sensors and deep network architectures , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Benjamin Schrauwen,et al.  Reservoir-based techniques for speech recognition , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[7]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[8]  Yoshua Bengio,et al.  STDP-Compatible Approximation of Backpropagation in an Energy-Based Model , 2017, Neural Computation.

[9]  Anthony S. Maida,et al.  Training a Hidden Markov Model with a Bayesian Spiking Neural Network , 2016, Journal of Signal Processing Systems.

[10]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Timothée Masquelier,et al.  Acquisition of visual features through probabilistic spike-timing-dependent plasticity , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[12]  Yann LeCun,et al.  Learning Invariant Feature Hierarchies , 2012, ECCV Workshops.

[13]  Wofgang Maas,et al.  Networks of spiking neurons: the third generation of neural network models , 1997 .

[14]  Nikola Kasabov,et al.  Dynamic evolving spiking neural networks for on-line spatio- and spectro-temporal pattern recognition. , 2013, Neural networks : the official journal of the International Neural Network Society.

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Matthew Cook,et al.  Unsupervised learning of digit recognition using spike-timing-dependent plasticity , 2015, Front. Comput. Neurosci..

[17]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Nikil D. Dutt,et al.  Categorization and decision-making in a neurobiologically plausible spiking network using a STDP-like learning rule , 2013, Neural Networks.

[19]  Trac D. Tran,et al.  Structured sparse representation with low-rank interference , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[20]  Timothée Masquelier,et al.  Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition , 2015, Neurocomputing.

[21]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[22]  Sander M. Bohte,et al.  Efficient forward propagation of time-sequences in convolutional neural networks using Deep Shifting , 2016, ArXiv.

[23]  J. Rouat,et al.  Exploration of rank order coding with spiking neural networks for speech recognition , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[24]  Hojjat Adeli,et al.  Spiking Neural Networks , 2009, Int. J. Neural Syst..

[25]  G. R. Doddington,et al.  Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today , 1981, IEEE Spectrum.

[26]  Liam McDaid,et al.  SWAT: A Spiking Neural Network Training Algorithm for Classification Problems , 2010, IEEE Transactions on Neural Networks.

[27]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[28]  Lou Boves,et al.  Spoken digit recognition using a hierarchical temporal memory , 2008, INTERSPEECH.

[29]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[30]  T.W. Berger,et al.  Speech recognition based on fundamental functional principles of the brain , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[31]  Simei Gomes Wysoski,et al.  Fast and adaptive network of spiking neurons for multi-view visual pattern recognition , 2008, Neurocomputing.

[32]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Y. Dan,et al.  Spike timing-dependent plasticity: from synapse to perception. , 2006, Physiological reviews.