Enhanced control and estimation of parameters for a telephone based isolated digit recognizer

The paper studies the use of discriminative techniques for a telephone based isolated digit recognizer with respect to a reduced system complexity. The combination of linear discriminant analysis (LDA) and minimum error classification (MEC) training provides improved system performance at reduced costs for the training process and for the application. Experiments are performed on an isolated digit database recorded over public lines including approximately 700 speakers. The use of a single linear transformation matrix based on LDA allows the use of density modeling, that doesn't consider variances explicitly at a high recognition rate. Minimum classification error training is found to perform best in case of a small amount of system parameters. A reduction of error rate up to 80% was achieved by the combination of the two methods for such a system configuration.

[1]  Stephan Euler,et al.  Experiments on the use of the generalized probabilistic descent method in speech recognition , 1992, ICSLP.

[2]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Dieter Geller,et al.  Improvements in connected digit recognition using linear discriminant analysis and mixture densities , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Erwin Marschall,et al.  METHODS FOR IMPROVED SPEECH RECOGNITION OVER TELEPHONE LINES , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.