Hybrid Approaches to Machine Translation

This volume provides an overview of the field of Hybrid Machine Translation (MT) and presents some of the latest research conducted by linguists and practitioners from different multidisciplinary areas. Nowadays, most important developments in MT are achieved by combining data-driven and rule-based techniques. These combinations typically involve hybridization of different traditional paradigms, such as the introduction of linguistic knowledge into statistical approaches to MT, the incorporation of data-driven components into rule-based approaches, or statistical and rule-based pre- and post-processing for both types of MT architectures. The book is of interest primarily to MT specialists, but also in the wider fields of Computational Linguistics, Machine Learning and Data Mining to translators and managers of translation companies and departments who are interested in recent developments concerning automated translation tools.

[1]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[2]  Andrei Popescu-Belis,et al.  Using Sense-labeled Discourse Connectives for Statistical Machine Translation , 2012, ESIRMT/HyTra@EACL.

[3]  Kevin Duh,et al.  HPSG-Based Preprocessing for English-to-Japanese Translation , 2012, TALIP.

[4]  David Yarowsky,et al.  Toward Statistical Machine Translation without Parallel Corpora , 2012, EACL 2012.

[5]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[6]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[7]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[8]  Philip Resnik,et al.  Using WSD Techniques for Lexical Selection in Statistical Machine Translation , 2005 .

[9]  Montserrat Marimon,et al.  An Open-Source Lexicon for Spanish , 2007, Proces. del Leng. Natural.

[10]  George Tambouratzis,et al.  Language-independent hybrid MT with PRESEMT , 2013, HyTra@ACL.

[11]  Jan Tore Lønning,et al.  Towards hybrid quality-oriented machine translation – on linguistics and probabilities in MT , 2007, TMI.

[12]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[13]  Alexander H. Waibel,et al.  Improving Statistical Machine Translation in the Medical Domain using the Unified Medical Language system , 2004, COLING.

[14]  Philip Resnik,et al.  WSD in NLP Applications , 2007 .

[15]  Kevin Duh,et al.  Extracting Pre-ordering Rules from Predicate-Argument Structures , 2011, IJCNLP.

[16]  Kun Yu,et al.  Analysis of the Difficulties in Chinese Deep Parsing , 2011, IWPT.

[17]  Hermann Ney,et al.  POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[18]  Tetsuro Nishino,et al.  Example based English-Bengali machine translation using wordnet , 2009 .

[19]  Septina Dian Larasati,et al.  Indonesian Dependency Treebank: Annotation and Parsing , 2012, PACLIC.

[20]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[21]  Dekai Wu,et al.  MT model space: statistical versus compositional versus example-based machine translation , 2005, Machine Translation.

[22]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[23]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[24]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[25]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[26]  George Tambouratzis,et al.  Implementing a Language-Independent MT Methodology , 2012 .

[27]  George Tambouratzis,et al.  Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system , 2014, HyTra@EACL.

[28]  Gholamreza Haffari,et al.  An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing , 2011, ACL.

[29]  Kevin Duh,et al.  Head Finalization Reordering for Chinese-to-Japanese Machine Translation , 2012, SSST@ACL.

[30]  Hua Wu,et al.  Boosting Statistical Word Alignment Using Labeled and Unlabeled Data , 2006, ACL.

[31]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[32]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[33]  Byoung-Tak Zhang,et al.  Target Word Selection Using WordNet and Data-Driven Models in Machine Translation , 2002, PRICAI.

[34]  Heshaam Faili,et al.  Target word selection in English to Persian translation using unsupervised approach , 2012, Int. J. Artif. Intell. Soft Comput..

[35]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[36]  Bruno Pouliquen,et al.  European Association for Machine Translation 2015 , 2015 .

[37]  Προκόπης Προκοπίδης,et al.  A suite of NLP tools for Greek , 2011 .

[38]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[39]  Robert C. Moore Learning Translations of Named-Entity Phrases from Parallel Corpora , 2003, EACL.

[40]  A. Crespo,et al.  Natural Language Engineering L a T E X Supplement , 1999 .

[41]  Pushpak Bhattacharyya,et al.  Case markers and Morphology: Addressing the crux of the fluency problem in English-Hindi SMT , 2009, ACL.

[42]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[43]  J. E. Miller,et al.  A Critical Introduction to Syntax , 2011 .

[44]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[45]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[46]  Sivaji Bandyopadhyay,et al.  Handling Multiword Expressions in Phrase-Based Statistical Machine Translation , 2011, MTSUMMIT.

[47]  Zachary Blanks,et al.  Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain , 2017 .

[48]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[49]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[50]  Σωκράτης Σοφιανόπουλος,et al.  Accurate phrase alignment in a bilingual corpus for EBMT systems , 2012 .

[51]  Kevin Duh,et al.  Head Finalization: A Simple Reordering Rule for SOV Languages , 2010, WMT@ACL.

[52]  Σωκράτης Σοφιανόπουλος,et al.  Hybrid Machine Translation for Low- and Middle- Density Languages , 2009 .

[53]  Chris Quirk,et al.  Dependency treelet translation: the convergence of statistical and example-based machine-translation? , 2006, MTSUMMIT.

[54]  Gunn Inger Lyse,et al.  Translation-based Word Sense Disambiguation , 2011 .

[55]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[56]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[57]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[58]  Lluís Màrquez i Villodre,et al.  Context-aware Discriminative Phrase Selection for Statistical Machine Translation , 2007, WMT@ACL.

[59]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[60]  Zdenek Zabokrtský,et al.  TectoMT: Modular NLP Framework , 2010, IceTAL.

[61]  Qun Liu,et al.  Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[62]  Petr Pajas,et al.  TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer , 2008, WMT@ACL.

[63]  Ria Hari Gusmita Some initial experiments with Indonesian probabilistic parsing , 2008 .

[64]  David F Manlove The Stable Marriage problem: An update , 2013 .

[65]  Haizhou Li,et al.  Pseudo-Word for Phrase-Based Machine Translation , 2010, ACL.

[66]  Taro Watanabe,et al.  Inducing a Discriminative Parser to Optimize Machine Translation Reordering , 2012, EMNLP.

[67]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[68]  Jason Eisner,et al.  Learning Linear Ordering Problems for Better Translation , 2009, EMNLP.

[69]  Naoki Fukui Theory of Projection in Syntax , 1995 .

[70]  Nathan Green,et al.  Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser , 2012 .

[71]  Daniel Jurafsky,et al.  Discriminative Reordering with Chinese Grammatical Relations Features , 2009, SSST@HLT-NAACL.

[72]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .

[73]  Michael Gasser,et al.  Lexical Selection for Hybrid MT with Sequence Labeling , 2013, HyTra@ACL.

[74]  John Hutchins Example-based machine translation: a review and commentary , 2006, Machine Translation.

[75]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[76]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[77]  Ming Zhou,et al.  A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[78]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[79]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.

[80]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[81]  Chris Callison-Burch,et al.  Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora , 2004, ACL.

[82]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[83]  Joakim Nivre,et al.  Single Malt or Blended? A Study in Multilingual Parser Optimization , 2007, EMNLP.

[84]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[85]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[86]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[87]  F. Xia,et al.  The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[88]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[89]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[90]  Hans Uszkoreit,et al.  Hybrid machine translation architectures within and beyond the EuroMatrix project , 2008, EAMT.

[91]  Maite Melero Dealing with Bilingual Divergences in MT using Target Language N-gram Models , 2001 .

[92]  Ananthakrishnan Ramanathan,et al.  A Word Reordering Model for Improved Machine Translation , 2011, EMNLP.

[93]  Gregorio Condori Mamani,et al.  Gregorio Condori Mamani : autobiografía , 1977 .

[94]  Xiaoyi Ma,et al.  Champollion: A Robust Parallel Text Sentence Aligner , 2006, LREC.

[95]  Sivaji Bandyopadhyay,et al.  Voted NER System using Appropriate Unlabeled Data , 2009, NEWS@IJCNLP.

[96]  Karthik Visweswariah,et al.  Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation , 2010, COLING.

[97]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[98]  Andy Way,et al.  Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation , 2010, MWE@COLING.

[99]  Dekai Wu,et al.  Toward machine translation with statistics and syntax and semantics , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[100]  Paul E. Black,et al.  Dictionary of Algorithms and Data Structures | NIST , 1998 .

[101]  Masao Utiyama,et al.  Post-ordering by Parsing for Japanese-English Statistical Machine Translation , 2012, ACL.

[102]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[103]  Marta R. Costa-jussà,et al.  Statistical Machine Reordering , 2006, EMNLP.

[104]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[105]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[106]  Qian Gao,et al.  Word Order in Mandarin: Reading and Speaking , 2008 .

[107]  Zdenek Zabokrtský,et al.  Tamil Dependency Parsing: Results Using Rule Based and Corpus Based Approaches , 2011, CICLing.

[108]  Simon Krek,et al.  The JOS Linguistically Tagged Corpus of Slovene , 2010, LREC.

[109]  Anne Göhring,et al.  Machine Learning Disambiguation of Quechua Verb Morphology , 2013, HyTra@ACL.

[110]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[111]  Rafael E. Banchs,et al.  Data Inferred Multi-word Expressions for Statistical Machine Translation , 2005 .

[112]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[113]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[114]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[115]  Montserrat Marimon,et al.  The IULA Treebank , 2012, LREC.

[116]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[117]  Nathan Green,et al.  Using an SVM Ensemble System for Improved Tamil Dependency Parsing , 2012, SPMRL@ACL 2012.

[118]  Jun'ichi Tsujii,et al.  Incremental Joint POS Tagging and Dependency Parsing in Chinese , 2011, IJCNLP.

[119]  Alexander M. Fraser,et al.  Semi-Supervised Training for Statistical Word Alignment , 2006, ACL.

[120]  David Yarowsky,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999, Natural Language Engineering.

[121]  Nathan Green,et al.  Influence of Parser Choice on Dependency-Based MT , 2011, WMT@EMNLP.

[122]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[123]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[124]  Charles N. Li,et al.  Mandarin Chinese: A Functional Reference Grammar , 1989 .

[125]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[126]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[127]  Daniel Zeman,et al.  Improving Parsing Accuracy by Combining Diverse Dependency Parsers , 2005, IWPT.

[128]  Qun Liu,et al.  Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information , 2012, ACL.

[129]  P. Sgall,et al.  Generativní popis jazyka a česká deklinace , 1967 .

[130]  David Yarowsky,et al.  Word Sense Disambiguation , 2010, Handbook of Natural Language Processing.

[131]  Pascual Martínez-Gómez,et al.  Effects of Parsing Errors on Pre-Reordering Performance for Chinese-to-Japanese SMT , 2013, PACLIC.

[132]  Fotini Simistira,et al.  A resource-light phrase scheme for language-portable MT , 2011, EAMT.

[133]  Eneko Agirre,et al.  Clustering WordNet word senses , 2003, RANLP.

[134]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[135]  Xiao Liu,et al.  A Chinese-Japanese Lexical Machine Translation through a Pivot Language , 2009, TALIP.

[136]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[137]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[138]  Yuji Matsumoto,et al.  Japanese Dependency Structure Analysis Based on Support Vector Machines , 2000, EMNLP.

[139]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[140]  Iaki Alegria,et al.  Mixing Approaches to MT for Basque: Selecting the best output from RBMT, EBMT and SMT , 2008 .

[141]  Sivaji Bandyopadhyay,et al.  A Modified Joint Source-Channel Model for Transliteration , 2006, ACL.

[142]  Kevin Knight,et al.  Building a Large Ontology for Machine Translation , 1993, HLT.

[143]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[144]  Andy Way,et al.  Robust large-scale EBMT with marker-based segmentation , 2004, TMI.

[145]  Xiaoqiang Luo,et al.  Constituent Reordering and Syntax Models for English-to-Japanese Statistical Machine Translation , 2010, COLING.

[146]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[147]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[148]  J. Smith,et al.  EBMT for SMT : A New EBMT-SMT Hybrid , 2009 .

[149]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[150]  James R. Glass,et al.  Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation , 2009, EACL.

[151]  Pierre Zweigenbaum,et al.  Automatic Construction of a MultiWord Expressions Bilingual Lexicon: A Statistical Machine Translation Evaluation Perspective , 2012 .

[152]  Pascual Martínez-Gómez,et al.  Using unlabeled dependency parsing for pre-reordering for Chinese-to-Japanese statistical machine translation , 2013, HyTra@ACL.

[153]  Chengqing Zong,et al.  Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora , 2008, COLING.

[154]  Antal van den Bosch,et al.  Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics , 2007 .

[155]  H. Sawaf Arabic Dialect Handling in Hybrid Machine Translation , 2010, AMTA.

[156]  Ming Zhou,et al.  A New Approach for English-Chinese Named Entity Alignment , 2004, EMNLP.

[157]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.