Modeling Lexical Cohesion for Document-Level Machine Translation

Lexical cohesion arises from a chain of lexical items that establish links between sentences in a text. In this paper we propose three different models to capture lexical cohesion for document-level machine translation: (a) a direct reward model where translation hypotheses are rewarded whenever lexical cohesion devices occur in them, (b) a conditional probability model where the appropriateness of using lexical cohesion devices is measured, and (c) a mutual information trigger model where a lexical cohesion relation is considered as a trigger pair and the strength of the association between the trigger and the triggered item is estimated by mutual information. We integrate the three models into hierarchical phrase-based machine translation and evaluate their effectiveness on the NIST Chinese-English translation tasks with large-scale training data. Experiment results show that all three models can achieve substantial improvements over the baseline and that the mutual information trigger model performs better than the others.

[1]  Adam Lopez,et al.  Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing , 2011 .

[2]  Peter E. Latham,et al.  Mutual Information , 2006 .

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[5]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Norman F. Davies Explorations in applied linguistics , 1980 .

[8]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[9]  Michael Halliday,et al.  Cohesion in English , 1976 .

[10]  Chunyu Kit,et al.  Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level , 2012, EMNLP.

[11]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[12]  J. Davenport Editor , 1960 .

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Jörg Tiedemann,et al.  Document-Wide Decoding for Phrase-Based Statistical Machine Translation , 2012, EMNLP.

[15]  Douglas W. Oard,et al.  Encouraging Consistent Translation Choices , 2012, NAACL.

[16]  Jörg Tiedemann,et al.  Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache , 2010, ACL 2010.

[17]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[18]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[19]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[20]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[21]  Guodong Zhou,et al.  Cache-based Document-level Statistical Machine Translation , 2011, EMNLP.

[22]  Muriel Vasconcellos,et al.  Cohesion and coherence in the presentation of machine translation products , 1989 .

[23]  Kenneth Ward Church Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.

[24]  Jörg Tiedemann,et al.  Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing , 2010 .

[25]  Jingbo Zhu,et al.  Document-level Consistency Verification in Machine Translation , 2011, MTSUMMIT.

[26]  Haizhou Li,et al.  Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers , 2011, ACL.