Features extraction and training strategies in continuous speech recognition for romanian language

This paper describes continuous speech recognition experiments for Romanian language, by using HMM (Hidden Markov Models) modeling. The following questions are to be discussed: the realization of a new front-end reconsidering linear prediction, the enhancement of recognition rates by context dependent modeling, the evaluation of training strategies ensuring speaker independence of the recognition process without speaker adaptation procedures, by speaker selection for training. The experiments lead to a development of the initial system with a promising front-end based on PLP (Perceptual Linear Prediction) coefficients, second ranked for the recognition performance obtained, near the first ranked front-end based on mel-frequency cepstral coefficients (MFCC), but far better as the last ranked, based on simple linear prediction. Concerning the implemented algorithm for context dependent modeling, it permits in all situations enhanced recognition rates. The experiments made with gender speaker selection enhanced under certain conditions the recognition rate, proving good generalization properties especially by training with the male speakers database.

[1]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[3]  Silke Goronzy,et al.  Robust Adaptation to Non-Native Accents in Automatic Speech Recognition , 2002, Lecture Notes in Computer Science.

[4]  Steve J. Young,et al.  Tree-Based State Tying for High Accuracy Modelling , 1994, HLT.

[5]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[6]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  I. Gavat,et al.  Features Extraction, Modeling and Training Strategies in Continuous Speech Recognition for Romanian Language , 2005, EUROCON 2005 - The International Conference on "Computer as a Tool".

[8]  Ben P. Milner A comparison of front-end configurations for robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Douglas D. O'Shaughnessy,et al.  Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[10]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  Tao Chen,et al.  Speaker selection training for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.