A Hybrid Approach to Paraphrase Detection

In this paper, we present a hybrid approach to the paraphrase detection task. The approach takes advantage of both feature-engineering and neural-based methods. First, we represent words and entities in a given sentence by using their pre-trained vectors. Then, those pre-trained vectors are encoded by a bidirectional long-short term memory network. The output matrix is fed into an attention network to obtain an attention vector. The final representation of the sentence is inner product of the matrix and the attention vector. We conduct experiments on the Microsoft Research Paraphrase corpus, a popular dataset used for benchmarking paraphrase detection methods. The experimental results show that our approach achieves competitive results.

[1]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[2]  Vinh Phu Nguyen,et al.  Evaluating Semantic Relatedness Between Concepts , 2016, IMCOM.

[3]  Hien T. Nguyen,et al.  Measuring Similarity for Short Texts on Social Media , 2016, CSoNet.

[4]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[5]  Akira Shimazu,et al.  Exploiting discourse information to identify paraphrases , 2014, Expert Syst. Appl..

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[8]  Richard Socher,et al.  Efficient and Robust Question Answering from Minimal Context over Documents , 2018, ACL.

[9]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[10]  Hiroyuki Shindo,et al.  Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[11]  Richard Socher,et al.  DCN+: Mixed Objective and Deep Residual Coattention for Question Answering , 2017, ICLR.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[14]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[15]  Aminul Islam,et al.  Semantic similarity of short texts , 2009 .

[16]  Qinmin Hu,et al.  CAN: Enhancing Sentence Similarity Modeling with Collaborative and Adversarial Network , 2018, SIGIR.

[17]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[18]  Nitin Madnani,et al.  Re-examining Machine Translation Metrics for Paraphrase Identification , 2012, NAACL.

[19]  Heri Ramampiaro,et al.  A Deep Network Model for Paraphrase Detection in Short Text Messages , 2017, Inf. Process. Manag..

[20]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Hien T. Nguyen,et al.  A Hybrid Approach to Answer Selection in Question Answering Systems , 2018, IUKM.

[24]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[25]  Hien T. Nguyen,et al.  A Multifaceted Approach to Sentence Similarity , 2015, IUKM.

[26]  Zornitsa Kozareva,et al.  Paraphrase Identification on the Basis of Supervised Machine Learning Techniques , 2006, FinTAL.

[27]  Zhiguo Wang,et al.  Sentence Similarity Learning by Lexical Decomposition and Composition , 2016, COLING.

[28]  Jacob Eisenstein,et al.  Discriminative Improvements to Distributional Sentence Similarity , 2013, EMNLP.