An efficient framework for robust mobile speech recognition services

A distributed framework for implementing automatic speech recognition (ASR) services on wireless mobile devices is presented. The framework is shown to scale easily to support a large number of mobile users connected over a wireless network and degrade gracefully under peak loads. The importance of using robust acoustic modeling techniques is demonstrated for situations when the use of specialized acoustic transducers on the mobile devices is not practical. It is shown that unsupervised acoustic normalization and adaptation techniques can reduce speech recognition word error rate (WER) by 30 percent. It is also shown that an unsupervised paradigm for updating and applying these robust modeling algorithms can be efficiently implemented within the distributed framework.

[1]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[2]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[3]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[4]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[5]  Aaron E. Rosenberg,et al.  On the implementation of ASR algorithms for hand-held wireless mobile devices , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  William J. Byrne,et al.  Robust estimation for rapid speaker adaptation using discounted likelihood techniques , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  O. Viikki,et al.  ASR in portable wireless devices , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.