An empirical study on language model adaptation

This article presents an empirical study of four techniques for adapting language models, including a maximum a posteriori (MAP) method and three discriminative training models, in the application of Japanese Kana-Kanji conversion. We compare the performance of these methods from various angles by adapting the baseline model to four adaptation domains. In particular, we attempt to interpret the results in terms of the character error rate (CER) by correlating them with the characteristics of the adaptation domain, measured by using the information-theoretic notion of cross entropy. We show that such a metric correlates well with the CER performance of the adaptation methods, and also show that the discriminative methods are not only superior to a MAP-based method in achieving larger CER reduction, but also in having fewer side effects and being more robust against the similarity between background and adaptation domains.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  J. Bellegarda An Overview of Statistical Language Model Adaptation , 2001 .

[5]  Jianfeng Gao,et al.  Toward a unified approach to statistical language modeling for Chinese , 2002, TALIP.

[6]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[7]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  Wei Yuan,et al.  Minimum Sample Risk Methods for Language Modeling , 2005, HLT/EMNLP.

[10]  Hisami Suzuki,et al.  Microsoft Research IME Corpus , 2005 .

[11]  GaoJianfeng,et al.  An empirical study on language model adaptation , 2006 .

[12]  David G. Stork,et al.  Pattern Classification , 1973 .

[13]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[14]  Brian Roark,et al.  Corrective language modeling for large vocabulary ASR with the perceptron algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Brian Roark,et al.  Language Model Adaptation with MAP Estimation and the Perceptron Algorithm , 2004, NAACL.

[16]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[17]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[18]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[19]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[20]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[22]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[23]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[24]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[25]  Jianfeng Gao,et al.  Exploiting Headword Dependency and Predictive Clustering for Language Modeling , 2002, EMNLP.

[26]  Jianfeng Gao,et al.  A Comparative Study on Language Model Adaptation Techniques Using New Evaluation Metrics , 2005, HLT.

[27]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.