A voice search approach to replying to SMS messages in automobiles

Automotive infotainment systems now provide drivers the ability to hear incoming Short Message Service (SMS) text messages using text-to-speech. However, the question of how best to allow users to respond to these messages using speech recognition remains unsettled. In this paper, we propose a robust voice search approach to replying to SMS messages based on template matching. The templates are empirically derived from a large SMS corpus and matches are accurately retrieved using a vector space model. In evaluating SMS replies within the acoustically challenging environment of automobiles, the voice search approach consistently outperformed using just the recognition results of a statistical language model or a probabilistic context-free grammar. For SMS replies covered by our templates, the approach achieved as high as 89.7% task completion when evaluating the top five reply candidates.

[1]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[2]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[3]  J.G. Wilpon,et al.  Intelligent virtual agents for contact center automation , 2005, IEEE Signal Processing Magazine.

[4]  Yu Shi,et al.  Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Tim Paek,et al.  The effect of speech interface accuracy on driving performance , 2007, INTERSPEECH.

[7]  Richard M. Schwartz,et al.  A scalable architecture for Directory Assistance automation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Ivan Tashev,et al.  Unified framework for single channel speech enhancement , 2009, 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[9]  Yun-Cheng Ju,et al.  A language-modeling approach to inverse text normalization and data cleanup for multimodal voice search applications , 2008, INTERSPEECH.

[10]  Geoffrey Zweig,et al.  Automated directory assistance system - from theory to practice , 2007, INTERSPEECH.

[11]  Alex Acero,et al.  Call analysis with classification using speech and non-speech features , 2006, INTERSPEECH.

[12]  Dong Yu,et al.  An introduction to voice search , 2008, IEEE Signal Processing Magazine.

[13]  Alexander I. Rudnicky,et al.  Universal speech interfaces , 2001, INTR.

[14]  Bo Thiesson,et al.  Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search , 2008, UIST '08.

[15]  Ute Ehrlich,et al.  How to access audio files of large data bases using in-car speech dialogue systems , 2007, INTERSPEECH.