Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions

Mel frequency cepstral coefficients (MFCCs) are a standard tool for automatic speech recognition (ASR), but they fail to capture part of the dynamics of speech. The nonlinear nature of speech suggests that extra information provided by some nonlinear features could be especially useful when training data are scarce or when the ASR task is very complex. In this paper, the Fractal Dimension of the observed time series is combined with the traditional MFCCs in the feature vector in order to enhance the performance of two different ASR systems. The first is a simple system of digit recognition in Chinese, with very few training examples, and the second is a large vocabulary ASR system for Broadcast News in Spanish.

[1]  Witold Kinsner,et al.  Consonant characterization using correlation fractal dimension for speech recognition , 1995, IEEE WESCANEX 95. Communications, Power, and Computing. Conference Proceedings.

[2]  Yi Li,et al.  Endpoint detection in noisy environment using complexity measure , 2007, 2007 International Conference on Wavelet Analysis and Pattern Recognition.

[3]  Antonio Guillamón,et al.  Vowel and consonant characterization using fractal dimension in natural speech , 2003, NOLISP.

[4]  Manuel Graña,et al.  Experiments for the selection of sub-word units in the Basque context for semantic tasks , 2012, Int. J. Speech Technol..

[5]  Heming Zhao,et al.  Fractal characteristic-based endpoint detection for whispered speech , 2006 .

[6]  Jordi Solé i Casals,et al.  Non-Linear and Non-Conventional Speech Processing: Alternative Techniques , 2010, Cognitive Computation.

[7]  T. Marwala,et al.  Multi-scale fractal dimension for speaker identification system , 2006 .

[8]  P. Maragos,et al.  Fractal dimensions of speech sounds: computation and application to automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.

[9]  Karmele López de Ipiña,et al.  Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context , 2010, PAAMS.

[10]  A. Hussain,et al.  Nonlinear speech processing: Overview and applications , 2002 .

[11]  Paolo Castiglioni,et al.  What is wrong in Katz's method? Comments on: "A note on fractal dimensions of biomedical waveforms" , 2010, Comput. Biol. Medicine.

[12]  D. Narayana Dutt,et al.  A note on fractal dimensions of biomedical waveforms , 2009, Comput. Biol. Medicine.

[13]  Petros Maragos,et al.  Filtered Dynamics and Fractal Dimensions for Noisy Speech Recognition , 2006, IEEE Signal Processing Letters.

[14]  Yasser Shekofteh,et al.  Using Phase Space based processing to extract proper features for ASR systems , 2010, 2010 5th International Symposium on Telecommunications.

[15]  Richard J. Povinelli,et al.  Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition , 2005, NOLISP.

[16]  Clifford A. Pickover,et al.  Fractal characterization of speech waveform graphs , 1986, Comput. Graph..

[17]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[18]  Brian Litt,et al.  A comparison of waveform fractal dimension algorithms , 2001 .

[19]  Nerea Ezeiza,et al.  GorUp: An Ontology-Driven Audio Information Retrieval System that Suits the Requirements of Under-Resourced Languages , 2011, INTERSPEECH.

[20]  Petros Maragos,et al.  Analysis and classification of speech signals by generalized fractal dimension features , 2009, Speech Commun..

[21]  M. J. Katz,et al.  Fractals and the analysis of waveforms. , 1988, Computers in biology and medicine.

[22]  Petros Maragos,et al.  Fractal aspects of speech signals: dimension and interpolation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Iasonas Kokkinos,et al.  Nonlinear analysis of speech signals: generalized dimensions and lyapunov exponents , 2003, INTERSPEECH.

[24]  T. Higuchi Approach to an irregular time series on the basis of the fractal theory , 1988 .

[25]  A. Marchal,et al.  Speech production and speech modelling , 1990 .

[26]  Anastasios A. Tsonis,et al.  Reconstructing Dynamics from Observables: the Issue of the Delay Parameter Revisited , 2007, Int. J. Bifurc. Chaos.