Constructing Corpora for the Development and Evaluation of Paraphrase Systems

Abstract Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.

[1]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[2]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[3]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[4]  Alfred V. Aho,et al.  Syntax Directed Translations and the Pushdown Assembler , 1969, J. Comput. Syst. Sci..

[5]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[6]  J J Bartko,et al.  ON THE METHODS AND THEORY OF RELIABILITY , 1976, The Journal of nervous and mental disease.

[7]  K. B. Hafner,et al.  On assessing interrater agreement for multiple attribute responses. , 1989, Biometrics.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  I. Dan Melamed,et al.  Manual Annotation of Translational Equivalence: The Blinker Project , 1998, ArXiv.

[10]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[11]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[12]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[13]  Chris Mellish,et al.  Instance-Based Natural Language Generation , 2001, NAACL.

[14]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[15]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[16]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[17]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[18]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[19]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[20]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[21]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[22]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[23]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[24]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[25]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[26]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[27]  Daniel Marcu,et al.  A Phrase-Based HMM Approach to Document/Abstract Alignment , 2004, EMNLP.

[28]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[29]  Joel D. Martin,et al.  Word Alignment for Languages with Scarce Resources , 2005, ParallelText@ACL.

[30]  Michael Collins,et al.  A Discriminative Model for Tree-to-Tree Translation , 2006, EMNLP.

[31]  Necip Fazil Ayan,et al.  Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT , 2006, ACL.

[32]  Dragos Stefan Munteanu,et al.  ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[33]  Jennifer Chu-Carroll,et al.  Answering the question you wish they had asked: The impact of paraphrasing for Question Answering , 2006, NAACL.

[34]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[35]  Mirella Lapata,et al.  Large Margin Synchronous Generation and its Application to Sentence Compression , 2007, EMNLP.

[36]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[37]  Chris Callison-Burch,et al.  Paraphrasing and translation , 2007 .

[38]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.