Context-aware Discriminative Phrase Selection for Statistical Machine Translation

In this work we revise the application of discriminative learning to the problem of phrase selection in Statistical Machine Translation. Inspired by common techniques used in Word Sense Disambiguation, we train classifiers based on local context to predict possible phrase translations. Our work extends that of Vickrey et al. (2005) in two main aspects. First, we move from word translation to phrase translation. Second, we move from the 'blank-filling' task to the 'full translation' task. We report results on a set of highly frequent source phrases, obtaining a significant improvement, specially with respect to adequacy, according to a rigorous process of manual evaluation.

[1]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[2]  Enrique Amigó,et al.  IQmt: A Framework for Automatic Machine Translation Evaluation , 2006, LREC.

[3]  Marine Carpuat,et al.  Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation , 2005, IJCNLP.

[4]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[5]  Tong Zhang,et al.  A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[6]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[7]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[8]  Marta R. Costa-jussà,et al.  MACHINE TRANSLATION SYSTEM DEVELOPMENT BASED ON HUMAN LIKENESS , 2006, 2006 IEEE Spoken Language Technology Workshop.

[9]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[10]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[11]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[14]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[15]  Lluís Màrquez i Villodre,et al.  Combining Linguistic Data Views for Phrase-based SMT , 2005, ParallelText@ACL.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[18]  Julio Gonzalo,et al.  QARLA: A Framework for the Evaluation of Text Summarization Systems , 2005, ACL.

[19]  David Yarowsky,et al.  The Johns Hopkins SENSEVAL2 system descriptions , 2001 .

[20]  Xavier Carreras,et al.  Filtering-Ranking Perceptron Learning for Partial Parsing , 2005, Machine Learning.

[21]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[22]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[23]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[24]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[26]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.