Unified language modeling using finite-state transducers with first applications

In this paper, we investigate a weighted finite-state transducer approach to language modelling for speech recognition applications. We explore a unified framework to conversational speech recognition which combines the benefits of grammars, n-gram and class-based language models, with the flexibility of using dynamic data, and the potential for integrating semantics. Based on a virtual personal assistant application, we show first applications and recognition results of out-of-grammar handling and the integration of class-based, weighted, dynamic data into this framework.

[1]  Frédéric Béchet,et al.  A language model combining n-grams and stochastic finite state automata , 1999, EUROSPEECH.

[2]  Xuedong Huang,et al.  A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Chin-Hui Lee,et al.  Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition , 2003, INTERSPEECH.

[4]  David Horowitz,et al.  Conversational Dialogue Management in the FASiL project , 2004, SIGDIAL Workshop.

[5]  Andrej Ljolje,et al.  The AT&T LVCSR-2000 System , 2000 .

[6]  Isabel Trancoso,et al.  Transducer composition for "on-the-fly" lexicon and language model integration , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  Richard M. Schwartz,et al.  Statistical Language Processing Using Hidden Understanding Models , 1994, HLT.

[8]  Hans J. G. A. Dolfing,et al.  Incremental language models for speech recognition using finite-state transducers , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[9]  Steve J. Young,et al.  Talking to machines (statistically speaking) , 2002, INTERSPEECH.

[10]  Xavier L. Aubert,et al.  An overview of decoding techniques for large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[11]  Chin-Hui Lee,et al.  A speech understanding system based on statistical representation of semantics , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  David Horowitz,et al.  A maximum entropy shallow functional parser for spoken language understanding , 2004, INTERSPEECH.

[13]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[14]  Johan Schalkwyk,et al.  Speech recognition with dynamic grammars using finite-state transducers , 2003, INTERSPEECH.

[15]  Wolfgang Minker Stochastically-based natural language understanding across tasks and languages , 1997, EUROSPEECH.