Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

We explore the intersection of rule-based and statistical approaches in machine translation, with a particular focus on past and current work at Microsoft Research. Until about 10 years ago, the only machine translation systems worth using were rule-based and linguistically-informed. Along came statistical approaches, which use large corpora to directly guide translations toward expressions people would actually say. Rather than making local decisions when writing and conditioning rules, goodness of translation was modeled numerically and free parameters were selected to optimize that goodness. This led to huge improvements in translation quality as more and more data was consumed. By necessity, the pendulum is swinging back towards the inclusion of linguistic features in MT systems. We describe some of our statistical and non-statistical attempts to incorporate linguistic insights into machine translation systems, showing what is currently working well, and what isn’t. We also look at trade-offs in using linguistic knowledge (“rules”) in pre- or post-processing by language pair, with a particular eye on the return on investment as training data increases in size.

[1]  Chris Quirk,et al.  Dependency treelet translation: the convergence of statistical and example-based machine-translation? , 2006, MTSUMMIT.

[2]  Chris Quirk,et al.  The impact of parse quality on syntactically-informed statistical machine translation , 2006, EMNLP.

[3]  Qun Liu,et al.  Forest-Based Translation , 2008, ACL.

[4]  Arul Menezes,et al.  Social Text Normalization using Contextual Graph Random Walks , 2013, ACL.

[5]  Philipp Koehn,et al.  Proceedings of the Third Workshop on Statistical Machine Translation (StatMT '08) , 2008 .

[6]  Marine Carpuat,et al.  The Trouble with SMT Consistency , 2012, WMT@NAACL-HLT.

[7]  Deborah A. Coughlin,et al.  Correlating automated and human assessments of machine translation quality , 2003, MTSUMMIT.

[8]  Peng Xu,et al.  Binarized Forest to String Translation , 2011, ACL.

[9]  R. Darnell Translation , 1873, The Indian medical gazette.

[10]  Noah A. Smith,et al.  Knowledge-Rich Morphological Priors for Bayesian Language Models , 2013, NAACL.

[11]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[12]  Eric K. Ringger,et al.  Using the Penn Treebank to Evaluate Non-Treebank Parsers , 2004, LREC.

[13]  W. N. Locke,et al.  Machine Translation of Languages , 1956 .

[14]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[15]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[16]  Kristina Toutanova,et al.  A Discriminative Lexicon Model for Complex Morphology , 2010, AMTA.

[17]  Peng Xu,et al.  Improved Domain Adaptation for Statistical Machine Translation , 2012, AMTA.

[18]  Marta R. Costa-jussà,et al.  Study and correlation analysis of linguistic, perceptual, and automatic machine translation evaluations , 2012, J. Assoc. Inf. Sci. Technol..

[19]  Arul Menezes,et al.  A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora , 2001, DDMMT@ACL.

[20]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[21]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[22]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[23]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[25]  Jianfeng Gao,et al.  Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems , 2008, EMNLP.

[26]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[27]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[28]  Stefan Riezler,et al.  On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[29]  Alon Lavie,et al.  The CMU-Avenue French-English Translation System , 2012, WMT@NAACL-HLT.

[30]  Karen Jensen,et al.  Natural Language Processing: The PLNLP Approach , 2013, Natural Language Processing.

[31]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.