Using Knowledge Graph And Search Query Click Logs in Statistical Language Model For Speech Recognition

This paper demonstrates how Knowledge Graph (KG) and Search Query Click Logs (SQCL) can be leveraged in statistical language models to improve named entity recognition for online speech recognition systems. Due to the missing in the training data, some named entities may be recognized as other common words that have the similar pronunciation. KG and SQCL cover comprehensive and fresh named entities and queries that can be used to mitigate the wrong recognition. First, all the entities located in the same area in KG are clustered together, and the queries that contain the entity names are selected from SQCL as the training data of a geographical statistical language model for each entity cluster. These geographical language models make the unseen named entities less likely to occur during the model training, and can be dynamically switched according to the user location in the recognition phase. Second, if any named entities are identified in the previous utterances within a conversational dialog, the probability of the n-best word sequence paths that contain their related entities will be increased for the current utterance by utilizing the entity relationships from KG and SQCL. This way can leverage the long-term contexts within the dialog. Experiments for the proposed approach on voice queries from a spoken dialog system yielded a 12.5% relative perplexity reduction in the language model measurement, and a 1.1% absolute word error rate reduction in the speech recognition measurement.

[1]  Jean-Luc Gauvain,et al.  Using information retrieval methods for language model adaptation , 2001, INTERSPEECH.

[2]  Bernard Mérialdo,et al.  A Dynamic Language Model for Speech Recognition , 1991, HLT.

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[5]  Eric Brill,et al.  Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs , 2001, NLPRS.

[6]  Diamantino Caseiro,et al.  Use of geographical meta-data in ASR language and acoustic models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Karthik Visweswariah,et al.  Utilizing relationships between named entities to improve speech recognition in dialog systems , 2010, 2010 IEEE Spoken Language Technology Workshop.

[8]  Simon King,et al.  Named entity extraction from word lattices , 2003, INTERSPEECH.

[9]  Michiel Bacchiani,et al.  Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Amanda Stent,et al.  Geo-Centric Language Models for Local Business Voice Search , 2009, HLT-NAACL.

[12]  Anthony J. Robinson,et al.  Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Xuedong Huang,et al.  Improved topic-dependent language modeling using information retrieval techniques , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Yoshinori Sagisaka,et al.  Speech recognition of a named entity , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[18]  Katsuhito Sudoh,et al.  Incorporating Speech Recognition Confidence into Discriminative Named Entity Recognition of Speech Data , 2006, ACL.

[19]  Michael Picheny,et al.  Turn-Based Language Modeling for spoken dialog systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Gökhan Tür,et al.  Using a knowledge graph and query click logs for unsupervised learning of relation detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Gökhan Tür,et al.  Extending domain coverage of language understanding systems via intent transfer between domains using knowledge graphs and search query click logs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).