论文信息 - Diversity-aware Evaluation for Paraphrase Patterns - 字舞流文

Diversity-aware Evaluation for Paraphrase Patterns

Common evaluation metrics for paraphrase patterns do not necessarily correlate with extrinsic recognition task performance. We propose a metric which gives weight to lexical variety in paraphrase patterns; our proposed metric has a positive correlation with paraphrase recognition task performance, with a Pearson correlation of 0.5~0.7 (k=10, with "strict" judgment) in a statistically significant level (p-value<0.01).

Teruko Mitamura | Hideki Shima | T. Mitamura | Hideki Shima

[1] Hoa Trang Dang,et al. Overview of the TREC 2006 Question Answering Track 99 , 2006, TREC.

[2] Jimmy J. Lin,et al. Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[3] Marius Pasca,et al. Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web , 2005, IJCNLP.

[4] Ellen M. Voorhees,et al. Overview of the TREC 2004 Novelty Track. , 2005 .

[5] Mirella Lapata,et al. Proceedings of ACL-08: HLT , 2008 .

[6] Patrick Pantel,et al. LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules , 2007, EMNLP.

[7] Liang Zhou,et al. Re-evaluating Machine Translation Results with Paraphrase Support , 2006, EMNLP.

[8] Patrick Pantel,et al. DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[9] Stanley Kok,et al. Hitting the Right Paraphrases in Good Time , 2010, NAACL.

[10] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[11] Ido Dagan,et al. Instance-based Evaluation of Entailment Rule Acquisition , 2007, ACL.

[12] Chris Callison-Burch,et al. Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[13] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[14] Manisha Sharma,et al. Evaluation of machine translation , 2011, ICWET.

[15] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[16] KekäläinenJaana. Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems , 2005 .

[17] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[18] Dekang Lin,et al. DIRT – Discovery of Inference Rules from Text , 2001 .

[19] Daniel Jurafsky,et al. Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.

[20] Philipp Koehn,et al. Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[21] Rahul Bhagat,et al. Large Scale Acquisition of Paraphrases for Learning Surface Patterns , 2008, ACL.

[22] Yi Liu,et al. Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[23] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[24] Ido Dagan,et al. Learning Entailment Rules for Unary Templates , 2008, COLING.

[25] Eduard H. Hovy,et al. An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques , 2011, ACL.

[26] Gregory A. Sanders,et al. The NIST 2008 Metrics for machine translation challenge—overview, methodology, metrics, and results , 2009, Machine Translation.

[27] Donald Metzler,et al. Mavuno: a scalable and effective Hadoop-based paraphrase acquisition system , 2011, LDMTA '11.

[28] Regina Barzilay,et al. Paraphrasing for Automatic Evaluation , 2006, NAACL.

[29] Eduard H. Hovy,et al. BEwT-E for TAC 2009's AESOP Task , 2009, TAC.

[30] Chris Callison-Burch,et al. Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.