Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via online updating of model parameters based on individual user data. Personalization significantly increased relative reduction in error rate by an additional 5%.

[1]  A. Gupta,et al.  A Bayesian Approach to , 1997 .

[2]  W. A. Woods,et al.  Language processing for speech understanding , 1986 .

[3]  Anthony Jameson,et al.  Leveraging Data About Users in General in the Learning of Individual User Models , 2001, IJCAI.

[4]  Laila Dybkjær,et al.  Spoken Multimodal Human-Computer Dialogue in Mobile Environments , 2005 .

[5]  Alexander I. Rudnicky,et al.  Universal speech interfaces , 2001, INTR.

[6]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[7]  Sharon L. Oviatt,et al.  Predicting hyperarticulate speech during human-computer error resolution , 1998, Speech Commun..

[8]  David Maxwell Chickering,et al.  Personalizing influence diagrams: applying online learning strategies to dialogue management , 2006, User Modeling and User-Adapted Interaction.

[9]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[10]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[11]  Geoffrey I. Webb,et al.  # 2001 Kluwer Academic Publishers. Printed in the Netherlands. Machine Learning for User Modeling , 1999 .

[12]  Eric Horvitz,et al.  Harnessing Models of Users' Goals to Mediate Clarification Dialog in Spoken Language Systems , 2001, User Modeling.

[13]  Dong Yu,et al.  Improved name recognition with user modeling , 2003, INTERSPEECH.

[14]  Pontus Johansson,et al.  User Modeling in Dialog Systems , 2002 .

[15]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[16]  Anthony Jameson,et al.  User Multitasking with Mobile Multimodal Systems , 2005 .

[17]  Tim Paek Personalizing Influence Diagrams , 2005 .

[18]  Eric Horvitz,et al.  Conversation as Action Under Uncertainty , 2000, UAI.

[19]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[20]  Eric Horvitz,et al.  In Pursuit of Effective Handsfree Decision Support : Coupling Bayesian Inference , Speech Understanding , and User Models , 1998 .