Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation

Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation This article describes the Asiya Toolkit for Automatic Machine Translation Evaluation and Meta-evaluation, an open framework offering system and metric developers a text interface to a rich repository of metrics and meta-metrics.

[1]  Julio Gonzalo,et al.  MT Evaluation: Human-Like vs. Human Acceptable , 2006, ACL.

[2]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[3]  Bruno Pouliquen,et al.  Automatic Identification of Document Translations in Large Multilingual Document Collections , 2006, ArXiv.

[4]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[5]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Lluís Màrquez i Villodre,et al.  A Graphical Interface for MT Evaluation and Error Analysis , 2012, ACL.

[8]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[9]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[10]  Johan Bos,et al.  Wide-Coverage Semantic Analysis with Boxer , 2008, STEP.

[11]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[12]  Mihai Surdeanu,et al.  Semantic Role Labeling Using Complete Syntactic Analysis , 2005, CoNLL.

[13]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[14]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[15]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[16]  R. Fisher 036: On a Distribution Yielding the Error Functions of Several Well Known Statistics. , 1924 .

[17]  Francis M. Tyers,et al.  Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development , 2010, Prague Bull. Math. Linguistics.

[18]  Erhard W. Hinrichs,et al.  The Tüba-D/Z Treebank: Annotating German with a Context-Free Backbone , 2004, LREC.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[21]  B. Navarro,et al.  Syntactic , semantic and pragmatic annotation in Cast 3 LB , 2003 .

[22]  Robert Tibshirani,et al.  Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[23]  M. Kendall Rank Correlation Methods , 1949 .

[24]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[25]  Lluís Màrquez i Villodre,et al.  Linguistic Features for Automatic Evaluation of Heterogenous MT Systems , 2007, WMT@ACL.

[26]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[27]  W. Hoeffding,et al.  Rank Correlation Methods , 1949 .

[28]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[29]  Enrique Amigó,et al.  IQmt: A Framework for Automatic Machine Translation Evaluation , 2006, LREC.

[30]  I. Dan Melamed,et al.  Precision and Recall of Machine Translation , 2003, NAACL.

[31]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[32]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[33]  Xavier Carreras,et al.  FreeLing: An Open-Source Suite of Language Analyzers , 2004, LREC.

[34]  Margaret King,et al.  Using Test Suites in Evaluation of Machine Translation Systems , 1990, COLING.

[35]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[36]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[37]  Joakim Nivre,et al.  A Dependency-Driven Parser for German Dependency and Constituency Representations , 2008, ACL 2008.

[38]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[39]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[40]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[41]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[42]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[43]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[44]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[45]  Lluís Màrquez i Villodre,et al.  Linguistic measures for automatic machine translation evaluation , 2010, Machine Translation.

[46]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[47]  K. Pearson,et al.  The Life, Letters and Labours of Francis Galton , 1931, Nature.

[48]  Julio Gonzalo,et al.  QARLA: A Framework for the Evaluation of Text Summarization Systems , 2005, ACL.

[49]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[50]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[51]  Mihai Surdeanu,et al.  Named entity recognition from spontaneous open-domain speech , 2005, INTERSPEECH.

[52]  Mihai Surdeanu,et al.  A Robust Combination Strategy for Semantic Role Labeling , 2005, HLT.

[53]  Lluís Màrquez i Villodre,et al.  Fast and accurate part-of-speech tagging: The SVM approach revisited , 2003, RANLP.

[54]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[55]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[56]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[57]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[58]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[59]  Pascal Denis,et al.  Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.

[60]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.