Paraphrase Extraction from Validated Question Answering Corpora in Spanish

Basing on the debate around the definition of paraphrase, this work aims to empirically clarify what is considered a paraphrase by humans. The experiment accomplished has its starting point in one of the several campaigns that every year generate large amounts of validated textual data, which can be reused for different purposes. This paper describes in detail a simple method -based on pattern-matching and deletion and insertion operations-, able to extract a remarkable amount of paraphrases from Question Answering assessed corpora. An assessment of the corpus obtained was accomplished by experts, and an analysis of this process is shown. This work has been developed for Spanish.

[1]  Andrew G. Clark,et al.  Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) , 2002 .

[2]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[3]  Jon Patrick,et al.  Paraphrase Identification by Text Canonicalization , 2005, ALTA.

[4]  Valentin Jijkoun,et al.  Overview of the CLEF 2006 Multilingual Question Answering Track , 2006, CLEF.

[5]  M. Felisa Verdejo,et al.  Overview of the Answer Validation Exercise 2006 , 2006, CLEF.

[6]  Stephen Wan,et al.  Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase , 2006, ALTA.

[7]  Chris Callison-Burch,et al.  Paraphrase Substitution for Recognizing Textual Entailment , 2006, CLEF.

[8]  Christian Jacquemin,et al.  Boosting Variant Recognition with Light Semantics , 2000, COLING.

[9]  Tsuneaki Kato,et al.  Question Answering Challenge for Five Ranked Answers and List Answers - Overview of NTCIR4 QAC2 Subtask 1 and 2 , 2004, NTCIR.

[10]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[11]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[12]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[13]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[14]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[15]  Tsuneaki Kato,et al.  Question Answering Challenge for Information Access Dialogue - Overview of NTCIR4 QAC2 Subtask , 2004, NTCIR.

[16]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .