A Hybrid Deep Learning Architecture for Paraphrase Identification

The binary classification task of Paraphrase Identification (PI) is vital in the field of Natural Language Processing. The objective of this study is to propose an optimized Deep Learning architecture in combination with usage of word embedding technique for the classification of sentence pairs as paraphrases or not. For Paraphrase Identification task, this paper proposes a hybrid Deep Learning architecture aiming to capture as many features from the inputted sentences in natural language. The aim is to accurately classify whether the pair of sentences are paraphrases of each other or not. The importance of using an optimized word-embedding approach in combination with the proposed hybrid Deep Learning architecture is explained. This study also deals with the lack of the training data required to generate a robust Deep Learning model. The intention is to harness the memorizing power of Long Short Term Memory (LSTM) neural network and the feature extracting capability of Convolutional Neural Network (CNN) in combination with the optimized word-embedding approach which aims to capture wide-sentential contexts and word-order. The proposed model is compared with existing systems and it surpasses all the existing systems in the performance in terms of accuracy.

[1]  Arthur C. Graesser,et al.  Paraphrase Identification with Lexico-Syntactic Graph Subsumption , 2008, FLAIRS.

[2]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[3]  Nitin Madnani,et al.  Re-examining Machine Translation Metrics for Paraphrase Identification , 2012, NAACL.

[4]  Xiang Zhang,et al.  Convolution neural network based syntactic and semantic aware paraphrase identification , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[5]  Samuel Fernando,et al.  A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[6]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[7]  Zornitsa Kozareva,et al.  Paraphrase Identification on the Basis of Supervised Machine Learning Techniques , 2006, FinTAL.

[8]  Aminul Islam,et al.  Semantic similarity of short texts , 2009 .

[9]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[10]  Wenpeng Yin,et al.  Convolutional Neural Network for Paraphrase Identification , 2015, NAACL.

[11]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[12]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13]  Vasile Rus,et al.  Dissimilarity Kernels for Paraphrase Identification , 2011, FLAIRS.

[14]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[15]  Rada Mihalcea,et al.  Measuring semantic relatedness using salient encyclopedic concepts , 2011 .

[16]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[17]  Wael Hassan Gomaa,et al.  A Survey of Text Similarity Approaches , 2013 .

[18]  Stephen Wan,et al.  Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase , 2006, ALTA.

[19]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[20]  Chris Brockett,et al.  Support Vector Machines for Paraphrase Identification and Corpus Construction , 2005, IJCNLP.

[21]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.