Czech Machine Translation in the project CzechMate

Abstract We present various achievements in statistical machine translation from English, German, Spanish and French into Czech. We discuss specific properties of the individual source languages and describe techniques that exploit these properties and address language-specific errors. Besides the translation proper, we also present our contribution to error analysis.

[1]  Maja Popović,et al.  Tools for Machine Translation Quality Inspection , 2013 .

[2]  Ondrej Bojar,et al.  Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[3]  Dekai Wu,et al.  MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles , 2011, ACL.

[4]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[5]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[6]  Daniel Marcu,et al.  HyTER: Meaning-Equivalent Semantics for Translation Evaluation , 2012, NAACL.

[7]  Ondrej Bojar,et al.  Automatic Translation Error Analysis , 2011, TSD.

[8]  Ondrej Bojar,et al.  PhraseFix: Statistical Post-Editing of TectoMT , 2013, WMT@ACL.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Ondrej Dusek,et al.  The Joy of Parallelism with CzEng 1.0 , 2012, LREC.

[12]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[13]  Ondrej Bojar,et al.  Scratching the Surface of Possible Translations , 2013, TSD.

[14]  Dekai Wu,et al.  Towards a Predicate-Argument Evaluation for MT , 2012, SSST@ACL.

[15]  Ondrej Dusek,et al.  DEPFIX: A System for Automatic Correction of Czech MT Outputs , 2012, WMT@NAACL-HLT.

[16]  Petr Sgall,et al.  The Meaning Of The Sentence In Its Semantic And Pragmatic Aspects , 1986 .

[17]  José B. Mariño,et al.  System Combination for Machine Translation of Spoken and Written Language , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Markéta Žabokrtský Zdeněk Kettnerová Václava Lopatková,et al.  Valenční slovník českých sloves. , 2008 .

[19]  Ondrej Bojar,et al.  Quiz-Based Evaluation of Machine Translation , 2011, Prague Bull. Math. Linguistics.

[20]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[21]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[22]  Ondrej Bojar,et al.  The Design of Eman, an Experiment Manager , 2013, Prague Bull. Math. Linguistics.

[23]  Ondrej Bojar,et al.  Tackling Sparse Data Issue in Machine Translation Evaluation , 2010, ACL.

[24]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[25]  Zdenek Zabokrtský,et al.  TectoMT: Modular NLP Framework , 2010, IceTAL.

[26]  Manisha Sharma,et al.  Evaluation of machine translation , 2011, ICWET.

[27]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[28]  Rudolf Rosa,et al.  Chimera - Three Heads for English-to-Czech Translation , 2013, WMT@ACL.

[29]  Daniel Zeman Data Issues of the Multilingual Translation Matrix , 2012, WMT@NAACL-HLT.

[30]  Ondrej Bojar,et al.  Analyzing Error Types in English-Czech Machine Translation , 2011, Prague Bull. Math. Linguistics.

[31]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[32]  Rudolf Rosa,et al.  Named entities from Wikipedia for machine translation , 2011, ITAT.

[33]  D. W. Barron Machine Translation , 1968, Nature.

[34]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[35]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[36]  Philipp Koehn,et al.  Findings of the 2012 Workshop on Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[37]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[38]  P. Luelsdorff The Prague School of Structural and Functional Linguistics , 1994 .

[39]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[40]  Philipp Koehn,et al.  The Feasibility of HMEANT as a Human MT Evaluation Metric , 2013, WMT@ACL.

[41]  Petr Pajas,et al.  TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer , 2008, WMT@ACL.

[42]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[43]  Ondrej Bojar,et al.  No Free Lunch in Factored Phrase-Based Machine Translation , 2013, CICLing.

[44]  J. W. Hunt,et al.  An Algorithm for Differential File Comparison , 2008 .

[45]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[46]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[47]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[48]  Philipp Koehn,et al.  Proceedings of the Third Workshop on Statistical Machine Translation (StatMT '08) , 2008 .

[49]  Hermann Ney,et al.  Towards Automatic Error Analysis of Machine Translation Output , 2011, CL.