Noise-robust acoustic signature recognition using nonlinear Hebbian learning

We propose using a new biologically inspired approach, nonlinear Hebbian learning (NHL), to implement acoustic signal recognition in noisy environments. The proposed learning processes both spectral and temporal features of input acoustic data. The spectral analysis is realized by using auditory gammatone filterbanks. The temporal dynamics is addressed by analyzing gammatone-filtered feature vectors over multiple temporal frames, which is called a spectro-temporal representation (STR). Given STR features, the exact acoustic signatures of signals of interest and the mixing property between signals of interest and noises are generally unknown. The nonlinear Hebbian learning is then employed to extract representative independent features from STRs, and to reduce their dimensionality. The extracted independent features of signals of interest are called signatures. In the meantime of learning, the synaptic weight vectors between input and output neurons are adaptively updated. These weight vectors project data into a feature subspace, in which signals of interest are selected, while noises are attenuated. Compared with linear Hebbian learning (LHL) which explores the second-order moment of data, the applied NHL involves the higher-order statistics of data. Therefore, NHL can capture representative features that are more statistically independent than LHL can. Besides, the nonlinear activation function of NHL can be chosen to refer to the implicit distribution of many acoustic sounds, and thus making the learning optimized in an aspect of mutual information. Simulation results show that the whole proposed system can more accurately recognize signals of interest than other conventional methods in severely noisy circumstances. One applicable project is detecting moving vehicles. Noise-contaminated vehicle sound is recognized while other non-vehicle sounds are rejected. When vehicle is contaminated by human vowel, bird chirp, or additive white Gaussian noise (AWGN) at SNR=0 dB, the proposed system dramatically decreases the error rate over normally used acoustic feature extraction method, mel-frequency cepstral computation (MFCC), by 26%, 36.3%, and 60.3%, respectively; and, over LHL by 20%, 2.3%, and 15.3%, respectively. Another applicable project is vehicle type identification. The proposed system achieves better performance than LHL, e.g., 40% improvement when gasoline heavy wheeled car is contaminated by AWGN at SNR=5 dB. More importantly, the proposed system is implemented in real-time field testing for months. The purpose is to detect vehicle with any make or model moving on the street with speed 10-35 mph. The missing rate is 1-2%, when vehicle is contaminated by any surrounding noises (human conversation, animal sound, airplane, wind, etc.) at SNR=0-20 dB. The false alarm rate is around 1%. To summarize, this study not only provides an efficient approach to extract representative independent features from high-dimensional data, but also offers robustness against severe noises.

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  J. Cardoso Infomax and maximum likelihood for blind source separation , 1997, IEEE Signal Processing Letters.

[3]  Li Liu Ground Vehicle Acoustic Signal Processing Based on Biological Hearing Models , 1999 .

[4]  L. V. Immerseel,et al.  Digital implementation of linear gammatone filters: Comparison of design methods , 2003 .

[5]  A. Hyvärinen,et al.  One-unit contrast functions for independent component analysis: a statistical analysis , 1997 .

[6]  S. Shamma,et al.  Representation of Complex Dynamic Spectra in Auditory Cortex , 1997 .

[7]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[9]  Walter Gautschi,et al.  A Computational Procedure for Incomplete Gamma Functions , 1979, TOMS.

[10]  Terrence J. Sejnowski,et al.  Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources , 1999, Neural Computation.

[11]  Christian Wellekens,et al.  On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Jonathan Z. Simon,et al.  Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[13]  Terence D. Sanger,et al.  An Optimality Principle for Unsupervised Learning , 1988, NIPS.

[14]  Erkki Oja,et al.  The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[15]  E. Oja,et al.  Independent Component Analysis , 2013 .

[16]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[17]  Mohamad H. Hassoun,et al.  Statistical basis of nonlinear hebbian learning and application to clustering , 1995, Neural Networks.

[18]  E. de Boer,et al.  Synthetic whole‐nerve action potentials for the cat , 1975 .

[19]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[20]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[21]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[22]  T. Sanger,et al.  Analysis of the two-dimensional receptive fields learned by the Generalized Hebbian Algorithm in response to random input , 1990, Biological Cybernetics.

[23]  G. Barrionuevo,et al.  Isolated NMDA receptor-mediated synaptic responses express both LTP and LTD. , 1992, Journal of neurophysiology.

[24]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[25]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[26]  Aapo Hyvärinen,et al.  New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit , 1997, NIPS.

[27]  Grant R. Gerhart,et al.  Wavelet-based ground vehicle recognition using acoustic signals , 1996, Defense + Commercial Sensing.

[28]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[29]  B. Moore Frequency Selectivity in Hearing , 1987 .

[30]  S. C. Choi,et al.  Maximum Likelihood Estimation of the Parameters of the Gamma Distribution and Their Bias , 1969 .

[31]  Jean-Franois Cardoso High-Order Contrasts for Independent Component Analysis , 1999, Neural Computation.

[32]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[33]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[34]  Gèunther Palm,et al.  Neural Assemblies: An Alternative Approach to Artificial Intelligence , 1982 .

[35]  Erkki Oja,et al.  Independent component analysis by general nonlinear Hebbian-like learning rules , 1998, Signal Process..

[36]  S. Shamma,et al.  Ripple Analysis in Ferret Primary Auditory Cortex. I. Response Characteristics of Single Units to Sinusoidally Rippled Spectra , 1994 .

[37]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[38]  Amir Averbuch,et al.  Wavelet-based acoustic detection of moving vehicles , 2009, Multidimens. Syst. Signal Process..

[39]  Waleed H. Abdulla,et al.  Performance evaluation of front-end algorithms for robust speech recognition , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[40]  Jonathan Z. Simon,et al.  Representation of Dynamic Broadband Spectra in Auditory Cortex , 1998 .

[41]  Philippe Garat,et al.  Blind separation of mixture of independent sources through a quasi-maximum likelihood approach , 1997, IEEE Trans. Signal Process..

[42]  Mel Siegel,et al.  Vehicle sound signature recognition by frequency vector principal component analysis , 1999, IEEE Trans. Instrum. Meas..

[43]  Mario E. Munich Bayesian subspace methods for acoustic signature recognition of vehicles , 2004, 2004 12th European Signal Processing Conference.

[44]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[45]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[46]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[47]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[48]  F. Attneave,et al.  The Organization of Behavior: A Neuropsychological Theory , 1949 .