Geo-location for voice search language modeling

We investigate the benefit of augmenting with geo-location information the language model used in speech recognition for voice-search. We observe reductions in perplexity of up to 15% relative on test sets obtained from both typed query data, as well as transcribed voice search data; on a subset of the test data consisting of “local” queries — search results displaying a restaurant, some address, or similar — the reduction in perplexity is even higher, up to 30% relative. Automatic speech recognition experiments confirm the utility of geo-location information for improved language modeling. Significant reductions in word error rate are observed both on general voice search traffic, as well as “local” traffic, up to 2% and 8% relative, respectively.

[1]  Johan Schalkwyk,et al.  Deploying GOOG-411: Early lessons in data, measurement, and testing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[3]  Geoffrey Zweig,et al.  Live search for mobile:Web services by voice on the cellphone , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Johan Schalkwyk,et al.  Query language modeling for voice search , 2010, 2010 IEEE Spoken Language Technology Workshop.

[5]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[6]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[7]  Shrikanth S. Narayanan,et al.  VPQ: a spoken language interface to large scale directory information , 1998, ICSLP.

[8]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9]  Thorsten Brants,et al.  Distributed Language Models , 2009, NAACL.

[10]  Sheng Chang,et al.  Modalities and demographics in voice search: Learnings from three case studies , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Ciprian Chelba,et al.  Empirical Exploration of Language Modeling for the google.com Query Stream as Applied to Mobile Voice Search , 2013 .

[12]  Atsushi Nakamura,et al.  Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.