The use of variable frame rate analysis in speech recognition

Abstract The application of a simple variable frame rate analysis to a continuous speech recognition system based on phone-level hidden Markov models, is described. Results are presented which show that, using standard three-state models, the addition of the variable frame rate analysis results in considerably improved performance, which is close to that obtained using simple duration sensitive models.

[1]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  S. M. Peeling,et al.  The ARM continuous speech recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Hunt,et al.  Evaluating the performance of connected-word speech recognition systems , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[5]  S. M. Peeling,et al.  Variable frame rate analysis in the ARM continuous speech recognition system , 1991, Speech Commun..

[6]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[7]  Jonathan G. Fiscus,et al.  Tools for the analysis of benchmark speech recognition tests , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  James Holmes,et al.  The JSRU channel vocoder , 1980 .

[9]  John Makhoul,et al.  BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  A. Cook,et al.  Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.