Paraphrase Identification using Semantic Heuristic Features

Paraphrase Identification (PI) problem is to classify that whether or not two sentences are close enough in meaning to be termed as paraphrases. PI is an important research dimension with practical applications in Information Extraction (IE), Machine Translation, Information Retrieval, Automatic Identification of Copyright Infringement, Question Answering Systems and Intelligent Tutoring Systems, to name a few. This study presents a novel approach of paraphrase identification using semantic heuristic features envisaging improving the accuracy compared to state-of-the-art PI systems. Finally, a comprehensive critical analysis of misclassifications is carried out to provide insightful evidence about the proposed approach and the corpora used in the experiments.

[1]  João Cordeiro,et al.  New Functions for Unsupervised Asymmetrical Paraphrase Detection , 2007, J. Softw..

[2]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[3]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[4]  Anupriya Rajkumar,et al.  Paraphrase Recognition using Neural Network Classification , 2010 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Samuel Fernando,et al.  A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[7]  D. Uribe Recognition of Paraphrasing Pairs , 2008, 2008 Electronics, Robotics and Automotive Mechanics Conference (CERMA '08).

[8]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.

[9]  Emiel Krahmer,et al.  Paraphrase Generation as Monolingual Translation: Data and Evaluation , 2010, INLG.

[10]  Diego Uribe Monotonicity Analysis for Paraphrase Detection , 2009, 2009 Electronics, Robotics and Automotive Mechanics Conference (CERMA).

[11]  Diego Uribe Effectively Using Monotonicity Analysis for Paraphrase Identification , 2009, 2009 Eighth Mexican International Conference on Artificial Intelligence.

[12]  Prodromos Malakasiotis,et al.  Paraphrase Recognition Using Machine Learning to Combine Similarity Measures , 2009, ACL.

[13]  Siddharth Patwardhan,et al.  Acquiring paraphrases from text corpora , 2009, K-CAP '09.

[14]  Chris Brockett,et al.  Support Vector Machines for Paraphrase Identification and Corpus Construction , 2005, IJCNLP.

[15]  Tat-Seng Chua,et al.  Paraphrase Recognition via Dissimilarity Significance Classification , 2006, EMNLP.

[16]  Vasile Rus,et al.  Paraphrase Identification Using Weighted Dependencies and Word Semantics , 2010, Informatica.

[17]  Sergei Nirenburg,et al.  Resolving Paraphrases to Support Modeling Language Perception in an Intelligent Agent , 2008, STEP.

[18]  Cordeiro João,et al.  New Functions for Unsupervised Asymmetrical Paraphrase Detection , 2007 .

[19]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[20]  Jon Patrick,et al.  Paraphrase Identification by Text Canonicalization , 2005, ALTA.

[21]  Zornitsa Kozareva,et al.  Paraphrase Identification on the Basis of Supervised Machine Learning Techniques , 2006, FinTAL.

[22]  Eiichiro Sumita,et al.  Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence , 2005, IJCNLP.

[23]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.