Word Relevance Modeling for Speech Recognition

Language models for speech recognition tend to be brittle across domains, since their performance is vulnerable to changes in the genre or topic of the text on which they are trained. A number of adaptation methods, discovering either lexical co-occurrence or topic cues, have been developed to mitigate this problem with varying degrees of success. Among them, a more recent thread of work is the relevance modeling approach, which has shown promise to capture the lexical co-occurrence relationship between the entire search history and an upcoming word. However, a potential downside to such an approach is the need of resorting to a retrieval procedure to obtain relevance information; this is usually complex and time-consuming for practical applications. In this paper, we propose a word relevance modeling framework, which introduces a novel use of relevance information for dynamic language model adaptation in speech recognition. It not only inherits the merits of several existing techniques but also provides a flexible yet systematic way to render the lexical, topical, and proximity relationships between the search history and the upcoming word. Experiments on large vocabulary continuous speech recognition demonstrate the performance merits of the methods instantiated from this framework when compared to several existing methods.

[1]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[3]  Kuan-Yu Chen,et al.  Latent topic modeling of word vicinity information for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  W. Bruce Croft Language models for information retrieval , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[5]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[6]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[7]  Berlin Chen,et al.  Lightly supervised and data-driven approaches to Mandarin broadcast news transcription , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[9]  Frederick Jelinek,et al.  Up from trigrams! - the struggle for improved language models , 1991, EUROSPEECH.

[10]  Hsin-Min Wang,et al.  MATBN: A Mandarin Chinese Broadcast News Corpus , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[11]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[12]  Tanja Schultz,et al.  Dynamic language model adaptation using variational Bayes inference , 2005, INTERSPEECH.

[13]  Berlin Chen Latent topic modelling of word co-occurence information for spoken document retrieval , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval , 2008, NAACL.

[15]  Kuan-Yu Chen,et al.  Leveraging Relevance Cues for Improved Spoken Document Retrieval , 2011, INTERSPEECH.

[16]  Berlin Chen,et al.  Discriminative language modeling for speech recognition with relevance information , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[17]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Kuan-Yu Chen,et al.  Relevance language modeling for speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).