Scaling Web-based Acquisition of Entailment Relations

Paraphrase recognition is a critical step for natural language interpretation. Accordingly, many NLP applications would benefit from high coverage knowledge bases of paraphrases. However, the scalability of state-of-the-art paraphrase acquisition approaches is still limited. We present a fully unsupervised learning algorithm for Web-based extraction of entailment relations, an extended model of paraphrases. We focus on increased scalability and generality with respect to prior work, eventually aiming at a full scale knowledge base. Our current implementation of the algorithm takes as its input a verb lexicon and for each verb searches the Web for related syntactic entailment templates. Experiments show promising results with respect to the ultimate goal, achieving much better scalability than prior Web-based methods.

[1]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[2]  Christian Jacquemin,et al.  Syntagmatic and Paradigmatic Representations of Term Variation , 1999, ACL.

[3]  Ralph Grishman,et al.  Unsupervised Discovery of Scenario-Level Patterns for Information Extraction , 2000, ANLP.

[4]  Vasile Rus,et al.  Logic Form Transformation of WordNet and its Applicability to Question Answering , 2001, ACL.

[5]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[6]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[7]  Daniel Marcu,et al.  Natural Language Based Reformulation Resource and Wide Exploitation for Question Answering , 2002, TREC.

[8]  François Yvon,et al.  Using the Web as a Linguistic Resource for Learning Reformulations Automatically , 2002, LREC.

[9]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[10]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[11]  Ralph Grishman,et al.  An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition , 2003, ACL.

[12]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[13]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[14]  Oren Glickman IDENTIFYING LEXICAL PARAPHRASES FROM A SINGLE CORPUS: A CASE STUDY FOR VERBS , 2003 .

[15]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[16]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .