A scale-rate filter selection method in the spectro-temporal domain for phoneme classification

Recently, there has been a significant increase in studies employing auditory models in speech recognition systems. In this paper, we propose a new evolutionary tuned feature extraction method by spectro-temporal analysis. In our proposed model, there is a special subspace for each phoneme with a specific best scale in the spectral filter and a specific best rate in the temporal filter. These two parameters were obtained by genetic cellular automata evolutionary algorithm. The extracted features from the specific subspace are classified by a binary one-versus-rest support vector machine. Finally, a multiclass classifier for all phonemes is employed by combining these sub-models. The proposed method improved the discrimination of phonemes significantly especially in highly confusable phonemes. To show the efficiency of the proposed feature sets, it was empirically compared with two baseline models. The achieved relative improvements are about 10% in classification rate for voiced plosives, unvoiced plosives and nasals; and about 7.38% for front vowels relative to the state of the art baseline model.

[1]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  Gerald Langner,et al.  TEMPORAL PROCESSING IN THE AUDITORY SYSTEM. , 2004 .

[4]  S. Shamma,et al.  Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra. , 1996, Journal of neurophysiology.

[5]  Hynek Hermansky,et al.  TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[6]  Christoph E Schreiner,et al.  Spectrotemporal structure of receptive fields in areas AI and AAF of mouse auditory cortex. , 2003, Journal of neurophysiology.

[7]  Yoshihito Amemiya,et al.  A functional νMOS circuit for implementing cellular-automaton picture-processing devices , 1997 .

[8]  K. Sen,et al.  Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[9]  Nima Mesgarani,et al.  Phoneme representation and classification in primary auditory cortex. , 2008, The Journal of the Acoustical Society of America.

[10]  Nima Mesgarani,et al.  Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  David Gelbart,et al.  Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.

[12]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[13]  Fernando Pérez-Cruz,et al.  SVM classifiers for ASR: A discussion about parameterization , 2004, 2004 12th European Signal Processing Conference.

[14]  Kyeongok Kang,et al.  A compressed domain scheme for classifying block edge patterns , 2005, IEEE Transactions on Image Processing.

[15]  Kuansan Wang,et al.  Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..

[16]  Biing-Hwang Juang,et al.  Speech Analysis in a Model of the Central Auditory System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Mounya Elhilali,et al.  A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. , 2008, The Journal of the Acoustical Society of America.

[18]  Idan Segev,et al.  Methods in Neuronal Modeling , 1988 .

[19]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[20]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[21]  Saeed Setayeshi,et al.  A novel spectro-temporal feature extraction method for phoneme classification , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[22]  Amir Masoud Rahmani,et al.  A synergy between HMM-GA based on stochastic cellular automata to accelerate speech recognition , 2009, IEICE Electron. Express.

[23]  Enrique Alba,et al.  The exploration/exploitation tradeoff in dynamic cellular genetic algorithms , 2005, IEEE Transactions on Evolutionary Computation.

[24]  T. J. Gordon,et al.  Genetic learning automata for function optimization , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[25]  Kuansan Wang,et al.  Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..

[26]  Marco Tomassini,et al.  The Parallel Genetic Cellular Automata: Application to Global Function Optimization , 1993 .

[27]  Diego H. Milone,et al.  Auditory Cortical Representations of Speech Signals for Phoneme Classification , 2007, MICAI.

[28]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[29]  Thomas F. Quatieri,et al.  High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  M. Abdallah,et al.  Implementation of an online DCT based time-varying delay estimator using systolic arrays , 2003, Comput. Electr. Eng..

[31]  Michael Kleinschmidt Localized spectro-temporal features for automatic speech recognition , 2003, INTERSPEECH.

[32]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[33]  Jarkko Kari,et al.  Theory of cellular automata: A survey , 2005, Theor. Comput. Sci..