Diversity-aware Evaluation for Paraphrase Patterns

Common evaluation metrics for paraphrase patterns do not necessarily correlate with extrinsic recognition task performance. We propose a metric which gives weight to lexical variety in paraphrase patterns; our proposed metric has a positive correlation with paraphrase recognition task performance, with a Pearson correlation of 0.5~0.7 (k=10, with "strict" judgment) in a statistically significant level (p-value<0.01).

[1]  Hoa Trang Dang,et al.  Overview of the TREC 2006 Question Answering Track 99 , 2006, TREC.

[2]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[3]  Marius Pasca,et al.  Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web , 2005, IJCNLP.

[4]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[5]  Mirella Lapata,et al.  Proceedings of ACL-08: HLT , 2008 .

[6]  Patrick Pantel,et al.  LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules , 2007, EMNLP.

[7]  Liang Zhou,et al.  Re-evaluating Machine Translation Results with Paraphrase Support , 2006, EMNLP.

[8]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[9]  Stanley Kok,et al.  Hitting the Right Paraphrases in Good Time , 2010, NAACL.

[10]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[11]  Ido Dagan,et al.  Instance-based Evaluation of Entailment Rule Acquisition , 2007, ACL.

[12]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[13]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[14]  Manisha Sharma,et al.  Evaluation of machine translation , 2011, ICWET.

[15]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[16]  KekäläinenJaana Binary and graded relevance in IR evaluations-Comparison of the effects on ranking of IR systems , 2005 .

[17]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[18]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[19]  Daniel Jurafsky,et al.  Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.

[20]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[21]  Rahul Bhagat,et al.  Large Scale Acquisition of Paraphrases for Learning Surface Patterns , 2008, ACL.

[22]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[23]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[24]  Ido Dagan,et al.  Learning Entailment Rules for Unary Templates , 2008, COLING.

[25]  Eduard H. Hovy,et al.  An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques , 2011, ACL.

[26]  Gregory A. Sanders,et al.  The NIST 2008 Metrics for machine translation challenge—overview, methodology, metrics, and results , 2009, Machine Translation.

[27]  Donald Metzler,et al.  Mavuno: a scalable and effective Hadoop-based paraphrase acquisition system , 2011, LDMTA '11.

[28]  Regina Barzilay,et al.  Paraphrasing for Automatic Evaluation , 2006, NAACL.

[29]  Eduard H. Hovy,et al.  BEwT-E for TAC 2009's AESOP Task , 2009, TAC.

[30]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.