Use of geographical meta-data in ASR language and acoustic models

The query distribution, in the speech recognition applications of directory assistance (DA) and voice-search, depends on the customer's location. This motivates the research on query models conditioned on the user location, here denoted as local models. We describe and test our methods for the estimation of local models with various degrees of spacial “granularity”, for the recognition of city-state (sub-task of DA) and for the recognition of business listings, spoken over iPhones in a nation-wide business-listing voice-search service. Our local language models improve the accuracy of city-state by 2.4% absolute (32% relative error reduction), and of voice-search by 2.2% (7% relative).

[1]  Geoffrey Zweig,et al.  Live search for mobile:Web services by voice on the cellphone , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Johan Schalkwyk,et al.  Deploying GOOG-411: Early lessons in data, measurement, and testing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Biing-Hwang Juang,et al.  A scalable method for voice search to nationwide business listings , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Sheng Chang,et al.  Modalities and demographics in voice search: Learnings from three case studies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Johan Schalkwyk,et al.  Language modeling for what-with-where on GOOG-411 , 2009, INTERSPEECH.

[6]  Young-In Song,et al.  Voice search of structured media data , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Michael Riley,et al.  Methods for task adaptation of acoustic models with limited transcribed in-domain data , 2004, INTERSPEECH.

[8]  Amanda Stent,et al.  Geo-Centric Language Models for Local Business Voice Search , 2009, HLT-NAACL.