Paraphrase Identification with Lexico-Syntactic Graph Subsumption

The paper presents a new approach to the problem of paraphrase identification. The new approach extends a previously proposed method for the task of textual entailment. The relationship between paraphrases and entailment is discussed to theoretically justify the new approach. The proposed approach is useful because it uses relatively few resources compared to similar systems yet it produces results similar or better than other approaches to paraphrase identification. The approach also offers significantly better results than two baselines. We report results on a standard data set as well as on a new, balanced data set.

[1]  Giulio Sandini,et al.  Cognitive Systems , 2005 .

[2]  Tat-Seng Chua,et al.  Paraphrase Recognition via Dissimilarity Significance Classification , 2006, EMNLP.

[3]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[4]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[5]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[6]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[7]  Fabio Massimo Zanzotto,et al.  Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach , 2005 .

[8]  Arthur C. Graesser,et al.  A study on textual entailment , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[9]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[10]  Alexander,et al.  [Lecture Notes in Computer Science] MICAI 2006: Advances in Artificial Intelligence Volume 4293 || Optimizing Weighted Kernel Function for Support Vector Machine by Genetic Algorithm , 2006 .

[11]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[12]  M. de Rijke,et al.  Light-Weight Entailment Checking for Computational Semantics , 2001 .

[13]  Vasile Rus,et al.  Assigning Function Tags with a Simple Model , 2005, CICLing.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Arthur C. Graesser,et al.  Assessing Entailer with a Corpus of Natural Language from an Intelligent Tutoring System , 2007, FLAIRS Conference.

[16]  Bernardo Magnini,et al.  Combining Lexical Resources with Tree Edit Distance for Recognizing Textual Entailment , 2005, MLCW.

[17]  William C. Mann,et al.  Natural Language Generation in Artificial Intelligence and Computational Linguistics , 1990 .

[18]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[21]  Jimmy J. Lin,et al.  Extracting Structural Paraphrases from Aligned Monolingual Corpora , 2003, IWP@ACL.

[22]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[23]  Zornitsa Kozareva,et al.  Paraphrase Identification on the Basis of Supervised Machine Learning Techniques , 2006, FinTAL.

[24]  B. Magnini,et al.  Recognizing Textual Entailment with Tree Edit Distance Algorithms , 2005 .

[25]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[26]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[27]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .