Paraphrase Identification Using Weighted Dependencies and Word Semantics

In this paper we propose a novel approach to the task of paraphrase identification. The proposed approach quantifies both the similarity and dissimilarity between two sentences. The similarity and dissimilarity is assessed based on lexico-semantic information, i.e., word semantics, and syntactic information in the form of dependencies, which are explicit syntactic relations between words in a sentence. Word semantics requires mapping words onto concepts in a taxonomy and then using word-to-word similarity metrics to compute their semantic relatedness. Dependencies are obtained using state-of-the-art dependency parsers. One important aspect of our approach is the weighting of missing dependencies, i.e., syntactic relations present in one sentence but not the other. We report experimental results on the Microsoft Paraphrase Corpus, a standard data set for evaluating approaches to paraphrase identification. The experiments showed that the proposed approach offers state-of-the-art results. In particular, our approach offers better precision when compared to other state-of-the-art systems.

[1]  Chris Brockett,et al.  Support Vector Machines for Paraphrase Identification and Corpus Construction , 2005, IJCNLP.

[2]  Tat-Seng Chua,et al.  Paraphrase Recognition via Dissimilarity Significance Classification , 2006, EMNLP.

[3]  Arthur C. Graesser,et al.  A study on textual entailment , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[4]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[5]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[6]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[7]  Jimmy J. Lin,et al.  Extracting Structural Paraphrases from Aligned Monolingual Corpora , 2003, IWP@ACL.

[8]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[9]  D. G. Hays Dependency Theory: A Formalism and Some Observations , 1964 .

[10]  Dekai Wu,et al.  Recognizing Paraphrases and Textual Entailment Using Inversion Transduction Grammars , 2005, EMSEE@ACL.

[11]  Jon Patrick,et al.  Paraphrase Identification by Text Canonicalization , 2005, ALTA.

[12]  Dekang Lin,et al.  Principle-Based Parsing Without Overgeneration , 1993, ACL.

[13]  Giulio Sandini,et al.  Cognitive Systems , 2005 .

[14]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[15]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[16]  William C. Mann,et al.  Natural Language Generation in Artificial Intelligence and Computational Linguistics , 1990 .

[17]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[18]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[19]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[20]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[21]  Arthur C. Graesser,et al.  Paraphrase Identification with Lexico-Syntactic Graph Subsumption , 2008, FLAIRS.

[22]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[23]  Zornitsa Kozareva,et al.  Paraphrase Identification on the Basis of Supervised Machine Learning Techniques , 2006, FinTAL.