Topological invariants as speech features for automatic speech recognition

The article presents topological invariants as speech features for speech recognition systems based on hidden Markov models. A short introduction is provided to the mathematical concept of topological invariants and space symmetries for the speech recognition problem. This involves a basic overview of the relevant auditory characteristic and its modelling in order to identify possible symmetries and invariants. Once the concept is derived, several of its modifications vital for HMM systems such as reduction of dimensions, within–class feature decorrelation and a signal plane rotation are presented and evaluated on a real system. The final system is evaluated and compared to other features using both context–dependent and context–independent models. Tests were accomplished on the professional speech database, where the achieved accuracies reached up to 97.7%, 98.7% and 98.9% for string of digits, application words and isolated digits tests, respectively.

[1]  Renato Nobili,et al.  How well do we understand the cochlea? , 1998, Trends in Neurosciences.

[2]  Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Vladimír Chudý,et al.  Isolated word recognition in Slovak via neural nets , 1991, Neurocomputing.

[4]  G. Fischbach,et al.  Mind and brain. , 1992, Scientific American.

[5]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Patrick Wambacq,et al.  Improved feature decorrelation for HMM-based speech recognition , 1998, ICSLP.

[8]  Mark J. F. Gales,et al.  Automatic complexity control for HLDA systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  W. Pitts,et al.  How we know universals; the perception of auditory and visual forms. , 1947, The Bulletin of mathematical biophysics.

[10]  William S. Massey,et al.  Algebraic Topology: An Introduction , 1977 .

[11]  Hakan Erdogan Regularizing linear discriminant analysis for speech recognition , 2005, INTERSPEECH.

[12]  Reiner Lenz,et al.  Group invariant pattern recognition , 1990, Pattern Recognit..

[13]  Christian Hacker,et al.  Revising Perceptual Linear Prediction (PLP) , 2005, INTERSPEECH.

[14]  Finnian Kelly,et al.  Auditory Features Revisited for Robust Speech Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[15]  P. A. Kolers Recognizing patterns , 1968 .

[16]  Roberto Togneri,et al.  Perceptual features for automatic speech recognition in noisy environments , 2009, Speech Commun..

[17]  Darjaa Sakhia,et al.  MobilDat-SK - a Mobile Telephone Extension to the SpeechDat-E SK Telephone Speech Database in Slovak , 2006 .

[18]  Jan Nouza,et al.  Fully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon , 2005, INTERSPEECH.

[19]  G. Rozinaj,et al.  ZCPA features for speech recognition , 2012, 2012 IX International Symposium on Telecommunications (BIHTEL).

[20]  Mayank Dave,et al.  Using Gaussian Mixtures for Hindi Speech Recognition System , 2011 .

[21]  J. Pickles An Introduction to the Physiology of Hearing , 1982 .

[22]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Ian H. Witten Principles of computer speech , 1982 .

[24]  Lukás Burget,et al.  Region dependent linear transforms in multilingual speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).