A language-modeling approach to inverse text normalization and data cleanup for multimodal voice search applications

In this paper we address two related challenges in multimodal local search applications on mobile devices: first, correctly displaying the business names, and second, harvesting language model training data from an inconsistently labeled corpus. We investigate the impact of common text normalization and the quality of language model training corpus on the accuracy of displayed results. We propose a new language model framework that eliminates the need for explicit inverse text normalization. The same framework can be applied to sift through corrupted language model training data. Our new language model is 25% more accurate while 25% smaller in size.

[1]  Johan Schalkwyk,et al.  Deploying GOOG-411: Early lessons in data, measurement, and testing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[3]  Esther Levin,et al.  Voice user interface design for automated directory assistance , 2005, INTERSPEECH.

[4]  Geoffrey Zweig,et al.  Automated directory assistance system - from theory to practice , 2007, INTERSPEECH.

[5]  Geoffrey Zweig,et al.  Language modeling for voice search: A machine translation approach , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Lori Lamel,et al.  Text normalization and speech recognition in French , 1997, EUROSPEECH.

[7]  Alex Acero,et al.  Call analysis with classification using speech and non-speech features , 2006, INTERSPEECH.

[8]  Sheng Chang,et al.  Modalities and demographics in voice search: Learnings from three case studies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Siddharth Bhatia,et al.  MS connect: a fully featured auto-attendant: system design, implementation and performance , 2004, INTERSPEECH.

[10]  Geoffrey Zweig,et al.  Live search for mobile:Web services by voice on the cellphone , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.