Source-Language Entailment Modeling for Translating Unknown Terms

This paper addresses the task of handling unknown terms in SMT. We propose using source-language monolingual models and resources to paraphrase the source text prior to translation. We further present a conceptual extension to prior work by allowing translations of entailed texts rather than paraphrases only. A method for performing this process efficiently is presented and applied to some 2500 sentences with unknown terms. Our experiments show that the proposed approach substantially increases the number of properly translated texts.

[1]  Marc Dymetman,et al.  Translating with Non-contiguous Phrases , 2005, HLT.

[2]  Walter Daelemans,et al.  Investigating Lexical Substitution Scoring for Subtitle Generation , 2006, CoNLL.

[3]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[4]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Regina Barzilay,et al.  Paraphrasing for Automatic Evaluation , 2006, NAACL.

[8]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[9]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[10]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[11]  Haifeng Wang,et al.  Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora , 2008, ACL.

[12]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[13]  Carlo Strapparava,et al.  Direct Word Sense Matching for Lexical Substitution , 2006, ACL.

[14]  Eric Nichols,et al.  Improving statistical machine translation by paraphrasing the training data. , 2008, IWSLT.

[15]  Ido Dagan,et al.  Contextual Preferences , 2008, ACL.

[16]  Nizar Habash,et al.  Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation , 2008, ACL.

[17]  Mei Yang,et al.  Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages , 2006, EACL.

[18]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[19]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[20]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[21]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  Christopher D. Manning,et al.  Textual entailment features for machine translation evaluation , 2009 .

[24]  Francisco Guzmán,et al.  Translation Paraphrases in Phrase-Based Machine Translation , 2008, CICLing.

[25]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[26]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[27]  Eiichiro Sumita,et al.  Translation of unknown words in phrase-based statistical machine translation for languages of rich morphology , 2008, SLTU.

[28]  Carlo Strapparava,et al.  Semantic Domains in Computational Linguistics , 2009 .

[29]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[30]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[31]  Alexander H. Waibel,et al.  Communicating Unknown Words in Machine Translation , 2008, LREC.

[32]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[33]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[34]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[35]  Diana McCarthy,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, *SEMEVAL.

[36]  Philippe Langlais,et al.  Translating Unknown Words by Analogical Learning , 2007, EMNLP.