UNED: evaluating text similarity measures without human assessments
暂无分享,去创建一个
This paper describes the participation of UNED NLP group in the SEMEVAL 2012 Semantic Textual Similarity task. Our contribution consists of an unsupervised method, Heterogeneity Based Ranking (HBR), to combine similarity measures. Our runs focus on combining standard similarity measures for Machine Translation. The Pearson correlation achieved is outperformed by other systems, due to the limitation of MT evaluation measures in the context of this task. However, the combination of system outputs that participated in the campaign produces three interesting results: (i) Combining all systems without considering any kind of human assessments achieve a similar performance than the best peers in all test corpora, (ii) combining the 40 less reliable peers in the evaluation campaign achieves similar results; and (iii) the correlation between peers and HBR predicts, with a 0.94 correlation, the performance of measures according to human assessments.
[1] Lluís Màrquez i Villodre,et al. Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation , 2010, Prague Bull. Math. Linguistics.
[2] Julio Gonzalo,et al. Corroborating Text Evaluation Results with Heterogeneous Measures , 2011, EMNLP.
[3] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[4] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .