A Hybrid Machine Translation Framework for an Improved Translation Workflow

A Hybrid Machine Translation Framework for an Improved Translation Workflow by Santanu Pal Doctor of Philosophy Computerlinguistik, Sprachwissenschaft und Sprachtechnologie Universität des Saarlandes Over the past few decades, due to a continuing surge in the amount of content being translated and ever increasing pressure to deliver high quality and high throughput translation, translation industries are focusing their interest on adopting advanced technologies such as machine translation (MT), and automatic post-editing (APE) in their translation workflows. Despite the progress of the technology, the roles of humans and machines essentially remain intact as MT/APE are moving from the peripheries of the translation field closer towards collaborative human-machine based MT/APE in modern translation workflows. Professional translators increasingly become post-editors correcting raw MT/APE output instead of translating from scratch which in turn increases productivity in terms of translation speed. The last decade has seen substantial growth in research and development activities on improving MT; usually concentrating on selected aspects of workflows starting from training data pre-processing techniques to core MT processes to post-editing methods. To date, however, complete MT workflows are less investigated than the core MT processes. In the research presented in this thesis, we investigate avenues towards achieving improved MT workflows. We study how different MT paradigms can be utilized and integrated to best effect. We also investigate how different upstream and downstream component technologies can be hybridized to achieve overall improved MT. Finally we include an investigation into human-machine collaborative MT by taking humans in the loop. In many of (but not all) the experiments presented in this thesis we focus on data scenarios provided by low resource language settings. German Summary (Zusammenfassung) Aufgrund des stetig ansteigenden Übersetzungsvolumens in den letzten Jahrzehnten und gleichzeitig wachsendem Druck hohe Qualität innerhalb von kürzester Zeit liefern zu müssen sind Übersetzungsdienstleister darauf angewiesen, moderne Technologien wie Maschinelle Übersetzung (MT) und automatisches Post-Editing (APE) in den Übersetzungsworkflow einzubinden. Trotz erheblicher Fortschritte dieser Technologien haben sich die Rollen von Mensch und Maschine kaum verändert. MT/APE ist jedoch nunmehr nicht mehr nur eine Randerscheinung, sondern wird im modernen Übersetzungsworkflow zunehmend in Zusammenarbeit von Mensch und Maschine eingesetzt. Fachübersetzer werden immer mehr zu Post-Editoren und korrigieren den MT/APE-Output, statt wie bisher Übersetzungen komplett neu anzufertigen. So kann die Produktivität bezüglich der Übersetzungsgeschwindigkeit gesteigert werden. Im letzten Jahrzehnt hat sich in den Bereichen Forschung und Entwicklung zur Verbesserung von MT sehr viel getan: Einbindung des vollständigen Übersetzungsworkflows von der Vorbereitung der Trainingsdaten über den eigentlichen MT-Prozess bis hin zu Post-Editing-Methoden. Der vollständige Übersetzungsworkflow wird jedoch aus Datenperspektive weit weniger berücksichtigt als der eigentliche MT-Prozess. In dieser Dissertation werden Wege hin zum idealen oder zumindest verbesserten MT-Workflow untersucht. In den Experimenten wird dabei besondere Aufmertsamfit auf die speziellen Belange von sprachen mit geringen ressourcen gelegt. Es wird untersucht wie unterschiedliche MT-Paradigmen verwendet und optimal integriert werden können. Des Weiteren wird dargestellt wie unterschiedliche vorund nachgelagerte Technologiekomponenten angepasst werden können, um insgesamt einen besseren MT-Output zu generieren. Abschließend wird gezeigt wie der Mensch in den MT-Workflow intergriert werden kann. Das Ziel dieser Arbeit ist es verschiedene Technologiekomponenten in den MT-Workflow zu integrieren um so einen verbesserten Gesamtworkflow zu schaffen. Hierfür werden hauptsächlich Hybridisierungsansätze verwendet. In dieser Arbeit werden außerdem Möglichkeiten untersucht, Menschen effektiv als Post-Editoren einzubinden. Die hierbei gewonnenen Übersetzungsprozessdaten

[1]  Pierre Zweigenbaum,et al.  Identifying bilingual Multi-Word Expressions for Statistical Machine Translation , 2012, LREC.

[2]  Marco Turchi,et al.  WMT16 APE Shared Task Data , 2016 .

[3]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[4]  Liang Huang,et al.  Statistical Syntax-Directed Translation with Extended Domain of Locality , 2006, AMTA.

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  Spyridon Matsoukas,et al.  Trait-Based Hypothesis Selection For Machine Translation , 2012, HLT-NAACL.

[7]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[8]  Mihaela Vela,et al.  Quantifying the Influence of MT Output in the Translators’ Performance: A Case Study in Technical Translation , 2014, HaCaT@EACL.

[9]  Jianfeng Gao,et al.  Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems , 2008, EMNLP.

[10]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[11]  Haizhou Li,et al.  Forest-based Tree Sequence to String Translation Model , 2009, ACL.

[12]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.

[13]  Andy Way,et al.  Handling Named Entities and Compound Verbs in Phrase-Based Statistical Machine Translation , 2010, MWE@COLING.

[14]  Josef van Genabith,et al.  Can Translation Memories afford not to use paraphrasing? , 2015, EAMT.

[15]  Josef van Genabith,et al.  Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity , 2016, COLING.

[16]  Josef van Genabith,et al.  Statistical Post-Editing for a Statistical MT System , 2011, MTSUMMIT.

[17]  Nadir Durrani,et al.  The Operation Sequence Model—Combining N-Gram-Based and Phrase-Based Statistical Machine Translation , 2015, CL.

[18]  Josef van Genabith,et al.  Neural Automatic Post-Editing Using Prior Alignment and Reranking , 2017, EACL.

[19]  Sara Stymne,et al.  Alignment-based reordering for SMT , 2012, LREC.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Markus Freitag,et al.  Review of Hypothesis Alignment Algorithms for MT System Combination via Confusion Network Decoding , 2012, WMT@NAACL-HLT.

[22]  Marcello Federico Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation , 2012, AMTA.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[25]  Maarit Koponen,et al.  Is Machine Translation Post-editing Worth the Effort?: A Survey of Research into Post-editing and Effort , 2016 .

[26]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .

[27]  Panagiotis Kanavos,et al.  Integrating Machine Translation with Translation Memory: A Practical Approach , 2010 .

[28]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[29]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[30]  Mark Steedman,et al.  Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[31]  Yifan He,et al.  Bridging SMT and TM with Translation Recommendation , 2010, ACL.

[32]  H. Altay Güvenir,et al.  Learning Translation Templates from Bilingual Translation Examples , 2004, Applied Intelligence.

[33]  Yang Liu,et al.  Extracting Hierarchical Rules from a Weighted Alignment Matrix , 2011, IJCNLP.

[34]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[35]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[36]  P. V. S. Avinesh A Data Mining Approach to Learn Reorder Rules for SMT , 2010, NAACL.

[37]  Hermann Ney,et al.  Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment , 2006, EACL.

[38]  Dipankar Das,et al.  Automatic Extraction of Complex Predicates in Bengali , 2010, MWE@COLING.

[39]  Christof Monz,et al.  NeurAlign: Combining Word Alignments Using Neural Networks , 2005, HLT/EMNLP.

[40]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[41]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[42]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[43]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[44]  Ana Guerberof Arenas Productivity and Quality in the Post-editing of Outputs from Translation Memories and Machine Translation , 2008 .

[45]  Kevin Knight,et al.  Automated Postediting of Documents , 1994, AAAI.

[46]  Kristina Toutanova,et al.  Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment , 2010, NAACL.

[47]  Yang Liu,et al.  Tree-to-String Alignment Template for Statistical Machine Translation , 2006, ACL.

[48]  Daniel Marcu,et al.  Towards a Unified Approach to Memory- and Statistical-Based Machine Translation , 2001, ACL.

[49]  Josef van Genabith,et al.  USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing , 2016, WMT.

[50]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[51]  Josef van Genabith,et al.  ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks , 2015, EMNLP.

[52]  Yves Lepage,et al.  Purest ever example-based machine translation: Detailed presentation and assessment , 2005, Machine Translation.

[53]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[54]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[55]  Chris Callison-Burch,et al.  Stream-based Translation Models for Statistical Machine Translation , 2010, NAACL.

[56]  Graham Neubig,et al.  Searching Translation Memories for Paraphrases , 2011, MTSUMMIT.

[57]  Partha Pakray Answer Validation through Textual Entailment , 2011, NLDB.

[58]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[59]  Hermann Ney,et al.  The RWTH System Combination System for WMT 2010 , 2010, WMT@ACL.

[60]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[61]  Hermann Ney,et al.  AER: do we need to “improve” our alignments? , 2006, IWSLT.

[62]  David M. Blei,et al.  Visualizing Topic Models , 2012, ICWSM.

[63]  Josef van Genabith,et al.  USAAR-SAPE: An English–Spanish Statistical Automatic Post-Editing System , 2015, WMT@EMNLP.

[64]  Qun Liu,et al.  genCNN: A Convolutional Architecture for Word Sequence Prediction , 2015, ACL.

[65]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[66]  Alon Lavie,et al.  Multi-engine machine translation guided by explicit word matching , 2005, EAMT.

[67]  Josef van Genabith,et al.  CATaLog Online: Porting a Post-editing Tool to the Web , 2016, LREC.

[68]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[69]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[70]  Nizar Habash,et al.  Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment , 2010, ACL.

[71]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[72]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[73]  Aravind K. Joshi,et al.  Using Information about Multi-word Expressions for the Word-Alignment Task , 2006 .

[74]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[75]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[76]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[77]  Andy Way,et al.  An Augmented Three-Pass System Combination Framework: DCU Combination System for WMT 2010 , 2010, WMT@ACL.

[78]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[79]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[80]  Sivaji Bandyopadhyay,et al.  Word Alignment-Based Reordering of Source Chunks in PB-SMT , 2014, LREC.

[81]  Josef van Genabith,et al.  Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation , 2015, EAMT.

[82]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[83]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[84]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[85]  Rudolf Rosa,et al.  Two-step translation with grammatical post-processing , 2011, WMT@EMNLP.

[86]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[87]  J. Smith,et al.  EBMT for SMT : A New EBMT-SMT Hybrid , 2009 .

[88]  Sudip Kumar Naskar,et al.  Mitigating Problems in Analogy-based EBMT with SMT and vice versa: A Case Study with Named Entity Transliteration , 2010, PACLIC.

[89]  Soma Paul Representing Compound Verbs in Indo WordNet , 2009 .

[90]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[91]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[92]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[93]  Jinxi Xu,et al.  A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model , 2008, ACL.

[94]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[95]  Md. Anwarus Salam Khan,et al.  UNL Explorer , 2012, COLING.

[96]  Sivaji Bandyopadhyay,et al.  JU_CSE_TAC: Textual Entailment Recognition System at TAC RTE-6 , 2010, TAC.

[97]  Sivaji Bandyopadhyay,et al.  Improving MT System Using Extracted Parallel Fragments of Text from Comparable Corpora , 2013, BUCC@ACL.

[98]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[99]  Giuseppe Riccardi,et al.  Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[100]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[101]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[102]  Yifan He,et al.  Combining Multiple Alignments to Improve Machine Translation , 2012, COLING.

[103]  Wenhu Chen,et al.  Guided Alignment Training for Topic-Aware Neural Machine Translation , 2016, AMTA.

[104]  Philipp Koehn,et al.  Neural Interactive Translation Prediction , 2016, AMTA.

[105]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[106]  Sivaji Bandyopadhyay,et al.  Shared Task System Description: Measuring the Compositionality of Bigrams using Statistical Methodologies , 2011 .

[107]  Graham Neubig,et al.  Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers , 2013, ACL.

[108]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[109]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[110]  Lichi Yuan Language Model Based on Word Clustering , 2006, PACLIC.

[111]  Andreas Eisele,et al.  Improving Machine Translation Performance Using Comparable Corpora , 2010 .

[112]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[113]  Nizar Habash,et al.  Fuzzy Syntactic Reordering for Phrase-based Statistical Machine Translation , 2011, WMT@EMNLP.

[114]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[115]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[116]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[117]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[118]  Hermann Ney,et al.  POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[119]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[120]  Elliott Macklovitch TransType2 : The Last Word , 2006, LREC.

[121]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[122]  Dietrich Klakow,et al.  Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling , 2016, COLING.

[123]  Jun'ichi Tsujii,et al.  Effective Use of Function Words for Rule Generalization in Forest-Based Translation , 2011, ACL.

[124]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[125]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[126]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[127]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[128]  Qun Liu,et al.  Forest-Based Translation , 2008, ACL.

[129]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[130]  Manuel Arcedillo,et al.  Living on the edge: productivity gain thresholds in machine translation evaluation metrics , 2015, MTSUMMIT.

[131]  Richard M. Schwartz,et al.  Improved Word-Level System Combination for Machine Translation , 2007, ACL.

[132]  Santanu Pal,et al.  Automatic Building and Using Parallel Resources for SMT from Comparable Corpora , 2014, HyTra@EACL.

[133]  Ondrej Dusek,et al.  DEPFIX: A System for Automatic Correction of Czech MT Outputs , 2012, WMT@NAACL-HLT.

[134]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[135]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[136]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[137]  MARTIN KAY The Proper Place of Men and Machines in Language Translation , 2004, Machine Translation.

[138]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[139]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[140]  Marion Weller,et al.  Exploring the Planet of the APEs: a Comparative Study of State-of-the-art Methods for MT Automatic Post-Editing , 2015, ACL.

[141]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[142]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[143]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[144]  Kevin Duh,et al.  Head Finalization Reordering for Chinese-to-Japanese Machine Translation , 2012, SSST@ACL.

[145]  Wei-Chen Cheng,et al.  Language modeling with sum-product networks , 2014, INTERSPEECH.

[146]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[147]  Anna Zaretskaya,et al.  Integration of Machine Translation in CAT Tools: State of the Art, Evaluation and User Attitudes , 2015 .

[148]  Carla Parra Escartín,et al.  Machine translation evaluation made fuzzier: a study on post-editing productivity and evaluation metrics in commercial settings , 2015, MTSUMMIT.

[149]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[150]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[151]  Sivaji Bandyopadhyay,et al.  MWE Alignment in Phrase Based Statistical Machine Translation , 2013, MTSUMMIT.

[152]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[153]  Andy Way,et al.  Recent Advances in Example-Based Machine Translation , 2004 .

[154]  Marco Marelli,et al.  SICK through the SemEval glasses. Lesson learned from the evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment , 2016, Language Resources and Evaluation.

[155]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[156]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[157]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[158]  James R. Glass,et al.  Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation , 2009, EACL.

[159]  Kristina Toutanova,et al.  Joint Optimization for Machine Translation System Combination , 2009, EMNLP.

[160]  Pierre Zweigenbaum,et al.  Automatic Construction of a MultiWord Expressions Bilingual Lexicon: A Statistical Machine Translation Evaluation Perspective , 2012 .

[161]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[162]  Deepa Gupta,et al.  POS-based reordering models for statistical machine translation , 2007 .

[163]  Michael J. Denkowski,et al.  Machine Translation for Human Translators , 2015 .

[164]  Marco Turchi,et al.  The FBK Participation in the WMT15 Automatic Post-editing Shared Task , 2015 .

[165]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[166]  Francisco Casacuberta,et al.  Online Learning for Neural Machine Translation Post-editing , 2017, ArXiv.

[167]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[168]  Josef van Genabith,et al.  Forest to String Based Statistical Machine Translation with Hybrid Word Alignments , 2016, CICLing.

[169]  Qun Liu,et al.  Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[170]  W. J. Hutchins,et al.  Machine Translation: A Brief History , 1995 .

[171]  David Yarowsky,et al.  Toward Statistical Machine Translation without Parallel Corpora , 2012, EACL 2012.

[172]  Lucas Nunes Vieira Indices of cognitive effort in machine translation post-editing , 2014, Machine Translation.

[173]  George F. Foster,et al.  Adaptive Language and Translation Models for Interactive Machine Translation , 2004, EMNLP.

[174]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[175]  John DeNero,et al.  Model-Based Aligner Combination Using Dual Decomposition , 2011, ACL.

[176]  Josef van Genabith,et al.  UdS-Sant: English–German Hybrid Machine Translation System , 2015, WMT@EMNLP.

[177]  Bonnie L. Webber,et al.  Translating Negation: Induction, Search And Model Errors , 2015, SSST@NAACL-HLT.

[178]  Alfred V. Aho,et al.  Translations on a Context-Free Grammar , 1971, Inf. Control..

[179]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[180]  Atsushi Fujita,et al.  A Poor Man’s Translation Memory Using Machine Translation Evaluation Metrics , 2012, AMTA.

[181]  Roland Kuhn,et al.  Phrasetable Smoothing for Statistical Machine Translation , 2006, EMNLP.

[182]  Rafael E. Banchs,et al.  Data Inferred Multi-word Expressions for Statistical Machine Translation , 2005 .

[183]  Mauro Cettolo,et al.  Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation , 2013, MTSUMMIT.

[184]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[185]  Donald A. DePalma,et al.  Project management for crowdsourced translation: How user-translated content projects work in real life , 2011 .

[186]  Josef van Genabith,et al.  A Neural Network based Approach to Automatic Post-Editing , 2016, ACL.

[187]  Johann Roturier,et al.  Deploying Novel MT Technology to Raise the Bar for Quality at Symantec: Key Advantages and Challenge , 2009, MTSUMMIT.

[188]  Sivaji Bandyopadhyay,et al.  Automatic Answer Validation System on English language , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[189]  François Masselot,et al.  A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context , 2010, Prague Bull. Math. Linguistics.

[190]  Stephan Vogel,et al.  CMU System Combination via Hypothesis Selection for WMT'10 , 2010, WMT@ACL.

[191]  Jian-Yun Nie,et al.  Parallel Web text mining for cross-language IR , 2000, RIAO.

[192]  Fethi Bougares,et al.  Factored Neural Machine Translation , 2016, ArXiv.

[193]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[194]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[195]  Sivaji Bandyopadhyay,et al.  Handling Multiword Expressions in Phrase-Based Statistical Machine Translation , 2011, MTSUMMIT.

[196]  Sharon O'Brien,et al.  Quality and Machine Translation: A realistic objective? , 2009 .

[197]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[198]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[199]  Sanjeev Khudanpur,et al.  Machine Translation System Combination using ITG-based Alignments , 2008, ACL.

[200]  Santanu Pal,et al.  Manawi: Using Multi-Word Expressions and Named Entities to Improve Machine Translation , 2014, WMT@ACL.

[201]  Stephan Vogel,et al.  Language Model Adaptation for Statistical Machine Translation via Structured Query Models , 2004, COLING.

[202]  Jan Niehues,et al.  A POS-Based Model for Long-Range Reorderings in SMT , 2009, WMT@EACL.

[203]  Philipp Koehn,et al.  The MateCat Tool , 2014, COLING.

[204]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[205]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition using Support Vector Machine: A Language Independent Approach , 2010 .

[206]  Josef van Genabith,et al.  Mining Parallel Resources for Machine Translation from Comparable Corpora , 2015, CICLing.

[207]  D. Bourigault,et al.  3 GTM : A Third-Generation Translation Memory , 2005 .

[208]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[209]  Josef van Genabith,et al.  CATaLog Online: A Web-based CAT Tool for Distributed Translation with Data Capture for APE and Translation Process Research , 2016, COLING.

[210]  O'Brien Sharon,et al.  EYE‐TRACKING AND TRANSLATION MEMORY MATCHES , 2007 .

[211]  Naomie Salim,et al.  An improved plagiarism detection scheme based on semantic role labeling , 2012, Appl. Soft Comput..

[212]  Philipp Koehn,et al.  A process study of computer-aided translation , 2009, Machine Translation.

[213]  Philipp Koehn,et al.  Convergence of Translation Memory and Statistical Machine Translation , 2010, JEC.

[214]  S. Ramaswamy En/gendering Language: The Poetics of Tamil Identity , 1993, Comparative Studies in Society and History.

[215]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[216]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[217]  Holger Schwenk,et al.  Issues in incremental adaptation of statistical MT from human post-edits , 2013, MTSUMMIT.

[218]  Josef van Genabith,et al.  CATaLog: New Approaches to TM and Post Editing Interfaces , 2015 .

[219]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[220]  Andy Way,et al.  Using Example-Based MT to Support Statistical MT when Translating Homogeneous Data in a Resource-Poor Setting , 2011, EAMT.

[221]  P. Isabelle,et al.  Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[222]  Wei-Yun Ma,et al.  System Combination for Machine Translation through Paraphrasing , 2015, EMNLP.

[223]  Carolyn Pillers Dobler,et al.  Mathematical Statistics , 2002 .

[224]  Chris Quirk,et al.  Generative Models of Noisy Translations with Applications to Parallel Fragment Extraction , 2007 .

[225]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[226]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[227]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[228]  Josef van Genabith,et al.  Maximizing TM Performance through Sub-Tree Alignment and SMT , 2010, AMTA.

[229]  Yang Liu,et al.  Weighted Alignment Matrices for Statistical Machine Translation , 2009, EMNLP.

[230]  Pablo Gamallo Otero Learning bilingual lexicons from comparable English and Spanish corpora , 2007, MTSUMMIT.

[231]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[232]  Arianna Bisazza,et al.  Chunk-Based Verb Reordering in VSO Sentences for Arabic-English Statistical Machine Translation , 2010, WMT@ACL.

[233]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[234]  Yang Feng,et al.  Lattice-based System Combination for Statistical Machine Translation , 2009, EMNLP.

[235]  Kemal Oflazer,et al.  Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.

[236]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[237]  Elina Lagoudaki The Value of Machine Translation for the Professional Translator , 2008, AMTA.

[238]  Turid Hedlund,et al.  Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings , 2001, Information Retrieval.

[239]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[240]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[241]  Pascale Fung,et al.  Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E , 2004, EMNLP.

[242]  Dipankar Das,et al.  Identifying Bengali Multiword Expressions using Semantic Clustering , 2014, ArXiv.

[243]  T. Veale Gaijin : A Bootstrapping , Template-Driven Approach to Example-Based MT , 1997 .

[244]  Roland Kuhn,et al.  Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[245]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[246]  Sivaji Bandyopadhyay,et al.  A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation , 2013, HyTra@ACL.

[247]  Haizhou Li,et al.  A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination , 2009, ACL.

[248]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[249]  Weiqiang Zhang,et al.  RNN language model with word clustering and class-based output layer , 2013, EURASIP J. Audio Speech Music. Process..

[250]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[251]  R. Gupta,et al.  Incorporating paraphrasing in translation memory matching and retrieval , 2014, EAMT.