Evaluating syntax-driven approaches to phrase extraction for MT

In this paper, we examine a number of different phrase segmentation approaches for Machine Translation and how they perform when used to supplement the translation model of a phrase-based SMT system. This work represents a summary of a number of years of research carried out at Dublin City University in which it has been found that improvements can be made using hybrid translation models. However, the level of improvement achieved is dependent on the amount of training data used. We describe the various approaches to phrase segmentation and combination explored, and outline a series of experiments investigating the relative merits of each method.

[1]  Alon Lavie,et al.  Stat-XFER: A General Search-Based Syntax-Driven Framework for Machine Translation , 2008, CICLing.

[2]  Mary Hearne,et al.  Comparing Constituency and Dependency Representations for SMT Phrase-Extraction , 2008, JEPTALNRECITAL.

[3]  Andy Way,et al.  wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web , 2003, CL.

[4]  D. Bourigault,et al.  Syntex, analyseur syntaxique de corpus , 2005 .

[5]  Philip Resnik,et al.  Soft Syntactic Constraints for Hierarchical Phrased-Based Translation , 2008, ACL.

[6]  Andy Way,et al.  Robust large-scale EBMT with marker-based segmentation , 2004, TMI.

[7]  Andy Way,et al.  Using percolated dependencies for phrase extraction in SMT , 2009 .

[8]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[9]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[10]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[11]  Andy Way,et al.  Automatic Generation of Parallel Treebanks , 2008, COLING.

[12]  Philippe Langlais,et al.  EBMT by tree-phrasing , 2006, Machine Translation.

[13]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[14]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[15]  Andy Way,et al.  Robust language pair-independent sub-tree alignment , 2007, MTSUMMIT.

[16]  Thomas R. G. Green,et al.  The necessity of syntax markers: Two experiments with artificial languages , 1979 .

[17]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[18]  Yanjun Ma,et al.  Exploiting alignment techniques in MATREX: the DCU machine translation system for IWSLT 2008 , 2008, IWSLT.

[19]  Sadao Kurohashi,et al.  Finding Translation Patterns from Paired Source and Target Dependency Structures , 2003 .

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  Andy Way,et al.  Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation , 2009, CICLing.

[22]  Andy Way,et al.  MATREX: DCU machine translation system for IWSLT 2006. , 2006, IWSLT.

[23]  Andy Way,et al.  Automatically generated parallel treebanks and their exploitability in machine translation , 2009, Machine Translation.

[24]  Andy Way,et al.  Marker-Based Filtering of Bilingual Phrase Pairs for SMT , 2009, EAMT.

[25]  Fei Xia,et al.  Converting Dependency Structures to Phrase Structures , 2001, HLT.

[26]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[27]  Alon Lavie,et al.  Decoding with Syntactic and Non-Syntactic Phrases in a Syntax-Based Machine Translation System , 2009, SSST@HLT-NAACL.

[28]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[29]  Yanjun Ma,et al.  MaTrEx: the DCU machine translation system for IWSLT 2007 , 2007, IWSLT.

[30]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[31]  Andy Way,et al.  Hybrid data-driven models of machine translation , 2005, Machine Translation.