Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition

Environmental noise can pose a threat to the stable operation of current speech recognition systems. It is therefore essential to develop a front feature set that is able to identify speech under low signal-to-noise ratio. In this paper, a robust fusion feature is proposed that can fully characterize speech information. To obtain the cochlear filter cepstral coefficients (CFCC), a novel feature is first extracted by the power-law nonlinear function, which can simulate the auditory characteristics of the human ear. Speech enhancement technology is then introduced into the front end of feature extraction, and the extracted feature and their first-order difference are combined in new mixed features. An energy feature Teager energy operator cepstral coefficient (TEOCC) is also extracted, and combined with the above-mentioned mixed features to form the fusion feature sets. Principal component analysis (PCA) is then applied to feature selection and optimization of the feature set, and the final feature set is used in a non-specific persons, isolated words, and small-vocabulary speech recognition system. Finally, a comparative experiment of speech recognition is designed to verify the advantages of the proposed feature set using a support vector machine (SVM). The experimental results show that the proposed feature set not only display a high recognition rate and excellent anti-noise performance in speech recognition, but can also fully characterize the auditory and energy information in the speech signals.

[1]  Zhang Cheng A New Cepstrum Coefficients Applied to Acoustic Target Recognition , 2009 .

[2]  Qi Li,et al.  An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Xia Wang,et al.  Speech signal feature parameters extraction algorithm based on PCNN for isolated word recognition , 2016, 2016 International Conference on Audio, Language and Image Processing (ICALIP).

[4]  Qiu Mengran,et al.  Study on modulation spectrum feature extraction of ship radiated noise based on auditory model , 2016, 2016 IEEE/OES China Ocean Acoustics (COA).

[5]  Qi Li,et al.  An auditory-based transfrom for audio signal processing , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[6]  Qi Li,et al.  Robust speaker identification using an auditory-based feature , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Li Zuoqian Robust speaker identification based on CFCC and phase information , 2015 .

[9]  Hemant A. Patil,et al.  Cochlear Filter and Instantaneous Frequency Based Features for Spoofed Speech Detection , 2017, IEEE Journal of Selected Topics in Signal Processing.

[10]  Maulana Azad,et al.  Automatic isolated digit recognition system: an approach using HMM , 2011 .

[11]  Navneet Upadhyay,et al.  An Improved Multi-Band Spectral Subtraction Algorithm for Enhancing Speech in Various Noise Environments , 2013 .

[12]  Lan Wei,et al.  Multi-classification spacecraft electrical signal identification method based on random forest , 2017 .

[13]  Ze Zhang,et al.  Research of Feature Extraction in Mongolian Speech Based on an Improved Algorithm of MFCC Parameter , 2012 .

[14]  Rafael Yusupov,et al.  Multimodal Interfaces of Human–Computer Interaction , 2018, Herald of the Russian Academy of Sciences.

[15]  Komal Arora Cochlear Implant Stimulation Rates and Speech Perception , 2012 .

[16]  Wang Jiao TEO-CFCC Characteristic Parameter Extraction Method for Speaker Recognition in Noisy Environments , 2012 .

[17]  Domingo López-Oller,et al.  Speech excitation signal recovering based on a novel error mitigation scheme under erasure channel conditions , 2018, Speech Commun..

[18]  R. D. Kharadkar,et al.  Comparative performance analysis and hardware implementation of adaptive filter algorithms for acoustic noise cancellation , 2015, 2015 International Conference on Information Processing (ICIP).

[19]  Saeed Gazor,et al.  On the distribution of Mel-filtered log-spectrum of speech in additive noise , 2015, Speech Commun..

[20]  K. Sreenivasa Rao,et al.  A robust unsupervised pattern discovery and clustering of speech signals , 2018, Pattern Recognit. Lett..

[21]  Hemant A. Patil,et al.  Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech , 2015, INTERSPEECH.

[22]  T Haji,et al.  [Acoustic analysis of certain consonants using a computed model of the peripheral auditory system]. , 1994, Nihon Jibiinkoka Gakkai kaiho.

[23]  Sathidevi Puthumangalathu Savithri,et al.  Single channel speech enhancement using adaptive filtering and best correlating noise identification , 2017, 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE).