论文信息 - Similar Word Model for Unfrequent Word Enhancement in Speech Recognition

Similar Word Model for Unfrequent Word Enhancement in Speech Recognition

The popular n-gram language model (LM) is weak for unfrequent words. Conventional approaches such as class-based LMs pre-define some sharing structures (e.g., word classes) to solve the problem. However, defining such structures requires prior knowledge, and the context sharing based on these structures is generally inaccurate. This paper presents a novel similar word model to enhance unfrequent words. In principle, we enrich the context of an unfrequent word by borrowing context information from some “similar words.” Compared to conventional class-based methods, this new approach offers a fine-grained context sharing by referring to words that best match the target word, and it is more flexible as no sharing structures need to be defined by hand. Experiments on a large-scale Chinese speech recognition task demonstrated that the similar word approach can improve performance on unfrequent words significantly, while keeping the performance on general tasks almost unchanged.

[1] Andreas Stolcke,et al. Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[2] Wayne H. Ward,et al. A class based language model for speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3] Haizhou Li,et al. Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers , 2011, ACL.

[4] Hermann Ney,et al. Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[5] Zhen Wang,et al. Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[6] Murat Saraclar,et al. Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[8] Zhen Wang,et al. Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[9] Noam Chomsky,et al. The architecture of language , 2000 .

[10] NaptaliWelly,et al. Topic-Dependent-Class-Based $n$ -Gram Language Model , 2012 .

[11] Dong Wang,et al. Low-frequency word enhancement with similar pairs in speech recognition , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[12] Dong Wang,et al. Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[13] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[14] Arto Salomaa,et al. Semirings, Automata and Languages , 1985 .

[15] Wolfgang Reichl,et al. A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[17] Jean Berstel,et al. Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[18] Mark Dredze,et al. Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[19] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20] Rong Liu,et al. Joint Semantic Relevance Learning with Text Data and Graph Knowledge , 2015, CVSC.

[21] Xunying Liu,et al. Syllable language models for Mandarin speech recognition: exploiting character language models. , 2013, The Journal of the Acoustical Society of America.

[22] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[23] Thomas Niesler,et al. Comparison of part-of-speech and automatically derived category-based language models for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24] Hideki Kashioka,et al. A Specialized WFST Approach for Class Models and Dynamic Vocabulary , 2012, INTERSPEECH.

[25] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .

[26] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[27] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.

[28] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[29] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.

[30] Mehryar Mohri,et al. Weighted Automata Algorithms , 2009 .

[31] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.