Techniques for Feature Extraction In Speech Recognition System : A Comparative Study

The time domain waveform of a speech signal carries all of the auditory information. From the phonological point of view, it little can be said on the basis of the waveform itself. However, past research in mathematics, acoustics, and speech technology have provided many methods for converting data that can be considered as information if interpreted correctly. In order to find some statistically relevant information from incoming data, it is important to have mechanisms for reducing the information of each segment in the audio signal into a relatively small number of parameters, or features. These features should describe each segment in such a characteristic way that other similar segments can be grouped together by comparing their features. There are enormous interesting and exceptional ways to describe the speech signal in terms of parameters. Though, they all have their strengths and weaknesses, we have presented some of the most used methods with their importance.

[1]  John G. Harris,et al.  Improving the filter bank of a classic speech feature extraction algorithm , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[2]  Lance J. Rips,et al.  Structure and process in semantic memory: A featural model for semantic decisions. , 1974 .

[3]  John G. Harris,et al.  Increased mfcc filter bandwidth for noise-robust phoneme recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..

[5]  Thomas Gold,et al.  Hearing , 1953, Trans. IRE Prof. Group Inf. Theory.

[6]  Cemal Ardil,et al.  Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems , 2007 .

[7]  Mohammad Reza Karami,et al.  Speech Signals Enhancement Using LPC Analysis based on Inverse Fourier Methods , 2009 .

[8]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[10]  Peter Kulchyski and , 2015 .

[11]  Bryan L. Pellom,et al.  The analysis and design of architecture systems for speech recognition on modern handheld-computing devices , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[12]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[13]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[14]  Feature Extraction in Speech Coding and Recognition , 2022 .

[15]  Richard J. Povinelli,et al.  Minimum Mean-Squared Error Estimation of Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[17]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[18]  Tomohiro Nakatani,et al.  A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition , 2006, Speech Commun..

[19]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[20]  Youngjik Lee Lee,et al.  Selecting Good Speech Features for Recognition , 1996 .

[21]  John G. Harris,et al.  Human factor cepstral coefficients , 2002 .