Extracting Structural Paraphrases from Aligned Monolingual Corpora

We present an approach for automatically learning paraphrases from aligned monolingual corpora. Our algorithm works by generalizing the syntactic paths between corresponding anchors in aligned sentence pairs. Compared to previous work, structural paraphrases generated by our algorithm tend to be much longer on average, and are capable of capturing long-distance dependencies. In addition to a standalone evaluation of our paraphrases, we also describe a question answering application currently under development that could immensely benefit from automatically-learned structural paraphrases.

[1]  Boris Katz,et al.  Exploiting Lexical Regularities in Designing Natural Language Systems , 1988, COLING.

[2]  Kenneth Ward Church,et al.  A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[3]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[4]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[5]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[6]  Evelyne Tzoukermann,et al.  Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[7]  L. Dekang,et al.  Extracting collocations from text corpora , 1998 .

[8]  Mark Dras,et al.  Tree adjoining grammar and the reluctant paraphrasing of text , 1999 .

[9]  Maria Lapata,et al.  A Corpus-based Account of Regular Polysemy: The Case of Context-sensitive Adjectives , 2001, NAACL.

[10]  Martin M. Soubbotin Patterns of Potential Answer Expressions as Clues to the Right Answers , 2001, TREC.

[11]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[12]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[13]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[14]  Daniel Marcu,et al.  Natural Language Based Reformulation Resource and Wide Exploitation for Question Answering , 2002, TREC.

[15]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[16]  Jimmy J. Lin,et al.  Selectively Using Relations to Improve Precision in Question Answering , 2003 .