Factored Language Models for Statistical Machine Translation

Machine translation systems, as a whole, are currently not able to use the output of linguistic tools, such as part-of-speech taggers, to effectively improve translation performance. However, a new language modeling technique, Factored Language Models can incorporate the additional linguistic information that is produced by these tools. In the field of automatic speech recognition, Factored Language Models smoothed with Generalized Parallel Backoff have been shown to significantly reduce language model perplexity. However, Factored Language Models have previously only been applied to statistical machine translation as part of a second-pass rescoring system. In this thesis, we show that a state-of-the-art phrase-based system using factored language models with generalized parallel backoff can improve performance over an identical system using trigram language models. These improvements can be seen both with the use of additional word features and without. The relative gain from the Factored Language Models increases with smaller training corpora, making this approach especially useful for domains with limited data.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Alexander H. Waibel,et al.  Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.

[6]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[9]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[10]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Mei Yang,et al.  Improved Language Modeling for Statistical Machine Translation , 2005, ParallelText@ACL.

[13]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[14]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[15]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[16]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[17]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[18]  Eric Brill,et al.  Beyond N-Grams: Can Linguistic Sophistication Improve Language Modeling? , 1998, COLING-ACL.

[19]  Kevin Duh,et al.  Automatic Learning of Language Model Structure , 2004, COLING.

[20]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[21]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[22]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[23]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[24]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[25]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[26]  Jeff A. Bilmes,et al.  Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27]  W. N. Locke,et al.  Machine Translation of Languages , 1956 .

[28]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  John R. Pierce,et al.  Language and Machines: Computers in Translation and Linguistics , 1966 .

[30]  Ronald Rosenfeld,et al.  Lattice based language models , 1997 .

[31]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[32]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[33]  Mei Yang,et al.  Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages , 2006, EACL.

[34]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[35]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.