Performance of an HMM speech recognizer using a real-time tracking microphone array as input

This correspondence reports results for a tracking, real-time microphone array as an input to a hidden Markov model based (HMM-based) connected alpha-digits speech recognizer. For a talker in the near field of the array (within 0.5 m), performance approaches that of a close-talking microphone input device.

[1]  Hong-Seok Kim,et al.  Using a real-time, tracking microphone array as input to an HMM speech recognizer , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Harvey F. Silverman,et al.  Hidden Markov model/neural network training techniques for connected alphadigit speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  James L. Flanagan,et al.  A digital processing system for source location and sound capture by large microphone arrays , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Harvey F. Silverman,et al.  Incremental ML estimation of HMM parameters for efficient training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Satoshi Nakamura,et al.  Robust speech recognition with speaker localization by a microphone array , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Harvey F. Silverman,et al.  Microphone-array speech recognition via incremental map training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Michael S. Brandstein,et al.  A closed-form method for finding source locations from microphone-array time-decay estimates , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  James L. Flanagan,et al.  Autodirective Microphone Systems for Natural Communication with Speech Recognizers , 1991, HLT.

[9]  Richard M. Stern,et al.  Towards Environment-Independent Spoken Language Systems , 1990, HLT.

[10]  Harvey F. Silverman,et al.  Using MAP estimated parameters to improve HMM speech recognition performance , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Harvey F. Silverman,et al.  Incremental MAP estimation of HMMs for efficient training and improved performance , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Harvey F. Silverman,et al.  Computations and evaluations of an optimal feature-set for an hmm-based recognizer , 1996 .

[13]  Satoshi Nakamura,et al.  Noise and room acoustics distorted speech recognition by HMM composition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  Michael S. Brandstein,et al.  A practical time-delay estimator for localizing speech sources with a microphone array , 1995, Comput. Speech Lang..

[15]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.