Robust Cochlear-Model-Based Speech Recognition

Accurate speech recognition can provide a natural interface for human–computer interaction. Recognition rates of the modern speech recognition systems are highly dependent on background noise levels and a choice of acoustic feature extraction method can have a significant impact on system performance. This paper presents a robust speech recognition system based on a front-end motivated by human cochlear processing of audio signals. In the proposed front-end, cochlear behavior is first emulated by the filtering operations of the gammatone filterbank and subsequently by the Inner Hair cell (IHC) processing stage. Experimental results using a continuous density Hidden Markov Model (HMM) recognizer with the proposed Gammatone Hair Cell (GHC) coefficients are lower for clean speech conditions, but demonstrate significant improvement in performance in noisy conditions compared to standard Mel-Frequency Cepstral Coefficients (MFCC) baseline.

[1]  Khaled Assaleh,et al.  A wavelet- and neural network-based voice system for a smart wheelchair control , 2011, J. Frankl. Inst..

[2]  Andrzej Cichoń,et al.  Application of a Phase Resolved Partial Discharge Pattern Analysis for Acoustic Emission Method in High Voltage Insulation Systems Diagnostics , 2018 .

[3]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[4]  Hamid Hassanpour,et al.  A self-tuning hybrid active noise control system , 2012, J. Frankl. Inst..

[5]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[6]  Stephanie Seneff,et al.  A computational model for the peripheral auditory system: Application of speech recognition research , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Björn W. Schuller,et al.  Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening , 2010, IEEE Journal of Selected Topics in Signal Processing.

[8]  Jhing-Fa Wang,et al.  Threshold-Based Noise Detection and Reduction for Automatic Speech Recognition System in Human-Robot Interactions , 2018, Sensors.

[9]  Richard M. Stern,et al.  Robust Speech Recognition Based on Binaural Auditory Processing , 2017, INTERSPEECH.

[10]  Björn W. Schuller,et al.  Deep Learning for Environmentally Robust Speech Recognition , 2017, ACM Trans. Intell. Syst. Technol..

[11]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[12]  Yi Jiang,et al.  Auditory features based on Gammatone filters for robust speech recognition , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[13]  A. V. Schaik,et al.  A Silicon Representation of the Meddis Inner Hair Cell Model , 2000 .

[14]  Adam Glowacz,et al.  Diagnostics of Rotor Damages of Three-Phase Induction Motors Using Acoustic Signals and SMOFS-20-EXPANDED , 2016 .

[15]  Marcello Pagano,et al.  Principles of Biostatistics , 1992 .

[16]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[17]  Kanji Ono,et al.  Review on Structural Health Evaluation with Acoustic Emission , 2018, Applied Sciences.

[18]  K. Davis,et al.  Automatic Recognition of Spoken Digits , 1952 .

[19]  Adam Glowacz,et al.  Fault diagnosis of single-phase induction motor based on acoustic signals , 2019, Mechanical Systems and Signal Processing.

[20]  Jung-Shan Lin,et al.  Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network , 2018, Applied System Innovation.

[21]  R. Meddis Simulation of mechanical to neural transduction in the auditory receptor. , 1986, The Journal of the Acoustical Society of America.

[22]  Laimutis Telksnys,et al.  Analysis of Factors Influencing Accuracy of Speech Recognition , 2015 .

[23]  Tong Zhao,et al.  A Method of Abnormal States Detection Based on Adaptive Extraction of Transformer Vibro-Acoustic Signals , 2017 .

[24]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[25]  Oded Ghitza Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[26]  Elaine Nicpon Marieb,et al.  Study Guide: Human Anatomy & Physiology , 1998 .

[27]  Chang Wen Chen,et al.  Mobile Multimedia Processing: fundamentals, Methods, and Applications , 2010 .

[28]  Weishan Zhang,et al.  Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN , 2017, Sensors.

[29]  Jont B. Allen How do humans process and recognize speech , 1993 .

[30]  Werner Hemmert,et al.  Automatic speech recognition with an adaptation model motivated by auditory processing , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[32]  Yongqiang Wang,et al.  An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  DeLiang Wang,et al.  An auditory-based feature for robust speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Kazuya Takeda,et al.  Robust In-Car Speech Recognition Based on Nonlinear Multiple Regressions , 2007, EURASIP J. Adv. Signal Process..

[35]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[36]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[37]  Quoc V. Le,et al.  Recurrent Neural Networks for Noise Reduction in Robust ASR , 2012, INTERSPEECH.

[38]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[39]  Roberto Togneri,et al.  Perceptual features for automatic speech recognition in noisy environments , 2009, Speech Commun..

[40]  Sen M. Kuo,et al.  Principle and applications of asymmetric crosstalk-resistant adaptive noise canceler , 2000, J. Frankl. Inst..

[41]  Qing Wang,et al.  Application of keywords speech recognition in agricultural voice information system , 2010, 2010 Second International Conference on Computational Intelligence and Natural Computing.

[42]  Huiming Yang,et al.  Experimental Investigation on Influence Factors of Acoustic Emission Activity in Coal Failure Process , 2018, Energies.

[43]  Harvey Fletcher,et al.  The nature of speech and its interpretation , 1922 .

[44]  Dariusz Mika,et al.  Advanced time-frequency representation in voice signal analysis , 2018 .

[45]  Juan Emilio Noriega-Linares,et al.  On the Application of the Raspberry Pi as an Advanced Acoustic Sensor Network for Noise Monitoring , 2016 .

[46]  Volker Hohmann,et al.  Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency , 2011, Speech Commun..