Similar Word Model for Unfrequent Word Enhancement in Speech Recognition

The popular n-gram language model (LM) is weak for unfrequent words. Conventional approaches such as class-based LMs pre-define some sharing structures (e.g., word classes) to solve the problem. However, defining such structures requires prior knowledge, and the context sharing based on these structures is generally inaccurate. This paper presents a novel similar word model to enhance unfrequent words. In principle, we enrich the context of an unfrequent word by borrowing context information from some “similar words.” Compared to conventional class-based methods, this new approach offers a fine-grained context sharing by referring to words that best match the target word, and it is more flexible as no sharing structures need to be defined by hand. Experiments on a large-scale Chinese speech recognition task demonstrated that the similar word approach can improve performance on unfrequent words significantly, while keeping the performance on general tasks almost unchanged.

[1]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[2]  Wayne H. Ward,et al.  A class based language model for speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Haizhou Li,et al.  Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers , 2011, ACL.

[4]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[5]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[6]  Murat Saraclar,et al.  Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[8]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[9]  Noam Chomsky,et al.  The architecture of language , 2000 .

[10]  NaptaliWelly,et al.  Topic-Dependent-Class-Based $n$ -Gram Language Model , 2012 .

[11]  Dong Wang,et al.  Low-frequency word enhancement with similar pairs in speech recognition , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[12]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[13]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[14]  Arto Salomaa,et al.  Semirings, Automata and Languages , 1985 .

[15]  Wolfgang Reichl,et al.  A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[17]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[18]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[19]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Rong Liu,et al.  Joint Semantic Relevance Learning with Text Data and Graph Knowledge , 2015, CVSC.

[21]  Xunying Liu,et al.  Syllable language models for Mandarin speech recognition: exploiting character language models. , 2013, The Journal of the Acoustical Society of America.

[22]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[23]  Thomas Niesler,et al.  Comparison of part-of-speech and automatically derived category-based language models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Hideki Kashioka,et al.  A Specialized WFST Approach for Class Models and Dynamic Vocabulary , 2012, INTERSPEECH.

[25]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[26]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[27]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[28]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[29]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[30]  Mehryar Mohri,et al.  Weighted Automata Algorithms , 2009 .

[31]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[32]  Dietrich Klakow,et al.  Transducer-based speech recognition with dynamic language models , 2013, INTERSPEECH.

[33]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[34]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[35]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[36]  Yee Whye Teh,et al.  A Bayesian Interpretation of Interpolated Kneser-Ney , 2006 .

[37]  Tatsuya Kawahara,et al.  Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language , 2014, Speech Commun..

[38]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[39]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[40]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[41]  James R. Glass,et al.  Iterative language model estimation: efficient data structure & algorithms , 2008, INTERSPEECH.

[42]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[43]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[44]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[45]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[46]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[47]  Xiaolong Wang,et al.  Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks , 2014, BioMed research international.

[48]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[49]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[50]  Arto Salomaa,et al.  Semirings, Automata, Languages , 1985, EATCS Monographs on Theoretical Computer Science.

[51]  Johan Schalkwyk,et al.  Speech recognition with dynamic grammars using finite-state transducers , 2003, INTERSPEECH.

[52]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[53]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[55]  Masatoshi Tsuchiya,et al.  Topic-Dependent-Class-Based $n$-Gram Language Model , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[57]  Dong Wang,et al.  Recognize foreign low-frequency words with similar pairs , 2015, INTERSPEECH.

[58]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[59]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[60]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[61]  Markus Völter,et al.  Architecture as Language , 2010, IEEE Softw..

[62]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.