Using metrics from complex networks to evaluate machine translation

Establishing metrics to assess machine translation (MT) systems automatically is now crucial owing to the widespread use of MT over the web. In this study we show that such evaluation can be done by modeling text as complex networks. Specifically, we extend our previous work by employing additional metrics of complex networks, whose results were used as input for machine learning methods and allowed MT texts of distinct qualities to be distinguished. Also shown is that the node-to-node mapping between source and target texts (English–Portuguese and Spanish–Portuguese pairs) can be improved by adding further hierarchical levels for the metrics out-degree, in-degree, hierarchical common degree, cluster coefficient, inter-ring degree, intra-ring degree and convergence ratio. The results presented here amount to a proof-of-principle that the possible capturing of a wider context with the hierarchical levels may be combined with machine learning methods to yield an approach for assessing the quality of MT systems.

[1]  L. da F. Costa,et al.  A generalized approach to complex networks , 2006 .

[2]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[3]  Lucas Antiqueira,et al.  Some issues on complex networks for author characterization , 2007, Inteligencia Artif..

[4]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[5]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[6]  Ying Zhang,et al.  Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System? , 2004, LREC.

[7]  Duncan J. Watts,et al.  The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .

[8]  John S. White,et al.  Task Tolerance of MT Output in Integrated Text Processes , 2000 .

[9]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Mark Steyvers,et al.  The Large-Scale Structure of Semantic Networks , 2005 .

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  César A. Hidalgo,et al.  Scale-free networks , 2008, Scholarpedia.

[16]  Luciano da Fontoura Costa,et al.  Shape Analysis and Classification: Theory and Practice , 2000 .

[17]  O. Kinouchi,et al.  Deterministic Walks in Random Networks: An Application to Thesaurus Graphs , 2001, cond-mat/0110217.

[18]  Luciano da Fontoura Costa,et al.  Extractive summarization using complex networks and syntactic dependency , 2012 .

[19]  M. Golumbic Algorithmic graph theory and perfect graphs , 1980 .

[20]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[21]  Lucas Antiqueira,et al.  COMPLEX NETWORKS ANALYSIS OF MANUAL AND MACHINE TRANSLATIONS , 2008 .

[22]  Lucas Antiqueira,et al.  Strong correlations between text quality and complex networks features , 2007 .

[23]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[24]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[25]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[26]  Filipi Nascimento Silva,et al.  Hierarchical Characterization of Complex Networks , 2004, cond-mat/0412761.

[27]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[28]  Rosane Minghim,et al.  Normalized compression distance for visual analysis of document collections , 2007, Comput. Graph..

[29]  Yifan He,et al.  The DCU Dependency-Based Metric in WMT-MetricsMATR 2010 , 2010, WMT@ACL.

[30]  S. M.G. Caldeira,et al.  The network of concepts in written texts , 2006 .

[31]  Dragomir R. Radev,et al.  Lexical similarity can distinguish between automatic and manual translations , 2006, LREC.

[32]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[33]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[34]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[35]  Mikel L. Forcada,et al.  LIHLA : A lexical aligner based on language-independent heuristics , 2005 .

[36]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[37]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[38]  Nitin Madnani,et al.  TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate , 2009, Machine Translation.

[39]  R. Balakrishnan,et al.  A textbook of graph theory , 1999 .

[40]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[41]  Eric Brill,et al.  A Unified Framework For Automatic Evaluation Using 4-Gram Co-occurrence Statistics , 2004, ACL.

[42]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[43]  T. Teichmann,et al.  Elementary Statistical Physics , 1959 .

[44]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[45]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[46]  M. Golumbic Chapter 3 - Perfect graphs , 2004 .

[47]  Mikel L. Forcada,et al.  Open-Source Portuguese-Spanish Machine Translation , 2006, PROPOR.

[48]  John S. White,et al.  Evaluation of Machine Translation , 1993, HLT.

[49]  Andy Way,et al.  Labelled Dependencies in Machine Translation Evaluation , 2007, WMT@ACL.

[50]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[51]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[52]  Lucas Antiqueira,et al.  Analyzing and modeling real-world phenomena with complex networks: a survey of applications , 2007, 0711.3199.

[53]  Daniel Jurafsky,et al.  Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.