Innovative approaches for large vocabulary name recognition

Automatic name dialing is a practical and interesting application of speech recognition on telephony systems. The IBM name recognition system is a large vocabulary, speaker independent system currently in use for reaching IBM employees in the United States. We present some innovative algorithms that improve name recognition accuracy. Unlike transcription tasks, such as the Switchboard task, recognition of names poses a variety of different problems. Several of these problems arise from the fact that foreign names are hard to pronounce for speakers who are not familiar with the names and that there are no standardized methods for pronouncing proper names. Noise robustness is another very important factor as these calls are typically made in noisy environments, such as from a car, cafeteria, airport, etc. and over different kinds of cellular and land-line telephone channels. We have performed a systematic analysis of the speech recognition errors and tackled the issues separately with techniques ranging from weighted speaker clustering, massive adaptation, rapid and unsupervised adaptation methods to pronunciation modeling methods. We find that the decoding accuracy can be improved significantly (28% relative) in this manner.

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Michael Picheny,et al.  Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Michael Picheny,et al.  Speaker adaptation based on pre-clustering training speakers , 1997, EUROSPEECH.

[5]  S. Steinmetz,et al.  Random House Webster's unabridged dictionary , 1997 .

[6]  Yu-Hung Kao,et al.  Speaker-independent name dialing with out-of-vocabulary rejection , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Juan Carlos Torrecilla,et al.  Name dialing using final user defined vocabularies in mobile (GSM and TACS) and fixed telephone networks , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Bhuvana Ramabhadran,et al.  Acoustics-only based automatic phonetic baseform generation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Yifan Gong,et al.  Speaker-dependent name dialing in a car environment with out-of-vocabulary rejection , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Weighted pairwise scatter to improve linear discriminant analysis , 2000, INTERSPEECH.

[11]  Michael Picheny,et al.  Rapid adaptation using penalized-likelihood methods , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).