A Multi-spike Approach for Robust Sound Recognition

The extraordinary performance of the brain on various cognitive tasks motivates the design of a biologically plausible system for the challenging task of environmental sound recognition. In this paper, we propose a novel approach based on multi-spike learning and key-point encoding. Our encoding extracts local temporal and spectral information from the sound and converts it into spatiotemporal spike pattern, which is further learned by the following spiking neural networks. Our experiments demonstrate the robustness and effectiveness of our approach across a variety of noise conditions, outperforming other conventional baseline methods in both mismatched and multi-condition scenarios.

[1]  H. Sompolinsky,et al.  The tempotron: a neuron that learns spike timing–based decisions , 2006, Nature Neuroscience.

[2]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[3]  Yan Song,et al.  Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Jont B. Allen How do humans process and recognize speech , 1993 .

[5]  Björn W. Schuller,et al.  Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field [Exploratory DSP] , 2010, IEEE Signal Processing Magazine.

[7]  Thomas C. Walters Auditory-based processing of communication sounds , 2011 .

[8]  Douglas D. O'Shaughnessy Automatic speech recognition , 2016 .

[9]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[10]  Jianwu Dang,et al.  Efficient Multi-spike Learning with Tempotron-Like LTP and PSD-Like LTD , 2018, ICONIP.

[11]  Haizhou Li,et al.  Spike Timing or Rate? Neurons Learn to Make Decisions for Both Through Threshold-Driven Plasticity , 2019, IEEE Transactions on Cybernetics.

[12]  Ilyas Ozer,et al.  Noise robust sound event classification with convolutional neural network , 2018, Neurocomputing.

[13]  Robert Gütig,et al.  Spiking neurons can discover predictive features by aggregate-label learning , 2016, Science.

[14]  Haizhou Li,et al.  Temporal coding of local spectrogram features for robust sound recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Haizhou Li,et al.  Rapid Feedforward Computation by Temporal Encoding and Learning With Spiking Neurons , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Jianwu Dang,et al.  Neuronal Classifier for both Rate and Timing-Based Spike Patterns , 2017, ICONIP.

[18]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.