Study of robust feature extraction techniques for speech recognition system

Automatic Speech Recognition (ASR) system gives better result in restricted conditions but under noisy conditions it does not perform well. The main aim of ASR research work is that a machine must recognize the entire input raw signal with 100% accuracy in real time. In the presence of noise, audio-visual features play a vital role in ASR systems. This paper summarizes various robust feature extraction techniques to study the performance of raw speech signal in automatic speech recognition. We also overview some recently proposed methods on the speech recognition, illustrating their pros and cons together with their detailed computational steps compared to other well known techniques.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[3]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[4]  Sachin Singh,et al.  A wavelet based method for removal of highly non-stationary noises from single-channel hindi speech patterns of low input SNR , 2015, Int. J. Speech Technol..

[5]  Panos E. Papamichalis,et al.  Practical approaches to speech coding , 1987 .

[6]  Namrata Dave,et al.  Feature Extraction Methods LPC, PLP and MFCC In Speech Recognition , 2013 .

[7]  Lakshmi Kanaka,et al.  A Novel Approach to Speech Recognition by Using Generalized Regression Neural Networks , 2011 .

[8]  R. V. Pawar,et al.  Speaker Identification using Neural Networks , 2007, IEC.

[9]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[10]  Jont B. Allen How do humans process and recognize speech , 1993 .

[11]  Lei Xie,et al.  A Comparative Study of Audio Features for Audio-to-Visual Conversion in Mpeg-4 Compliant Facial Animation , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[12]  Achyuta Nand Mishra,et al.  Comparative wavelet, PLP, and LPC speech recognition techniques on the Hindi speech digits database , 2010, International Conference on Digital Image Processing.

[13]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[14]  Achyuta Nand Mishra,et al.  Hybrid Features for Speaker Independent Hindi Speech Recognition , 2013 .

[15]  Chusak Limsakul,et al.  Feature Extraction and Reduction of Wavelet Transform Coefficients for EMG Pattern Classification , 2012 .

[16]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[17]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[18]  Daniel P. W. Ellis,et al.  Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition , 1999 .

[19]  Matti Pietikäinen,et al.  A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..

[20]  Simon King,et al.  Speech and Audio Signal Processing , 2011 .