Paraphrasing Identification Techniques in English and Arabic Texts

Rewriting sentences using different words leading to the same meaning of the original sentence is called paraphrasing while paraphrasing identification task is the task of detecting the sentence that is paraphrased from another. This research provides a literature survey of researches that have studied and proposed methods for paraphrasing identification of English and Arabic texts. Accordingly, the impact of paraphrasing on NLP applications and paraphrasing benchmarks are also studied. The comparative study on the available paraphrasing identification techniques shows that the best precision is provided with WordNet based techniques while the best accuracy is provided by deep learning with statistical features of English texts. In the case of Arabic, the best precision is provided by the distributed word vector representations with Convolutional Neural Network.

[1]  Timothy W. Finin,et al.  Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy , 2013, IEEE Transactions on Knowledge and Data Engineering.

[2]  Arafat Awajan,et al.  Arabic Semantic Similarity Approaches - Review , 2018, 2018 International Arab Conference on Information Technology (ACIT).

[3]  Noriko Tomuro,et al.  Interrogative Reformulation Patterns and Acquisition of Question Paraphrases , 2003, IWP@ACL.

[4]  Pascual Martínez-Gómez,et al.  Paraphrase for Open Question Answering: New Dataset and Methods , 2016 .

[5]  Mounir Zrigui,et al.  Semantic Similarity Analysis for Paraphrase Identification in Arabic Texts , 2017, PACLIC.

[6]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[7]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[8]  Mounir Zrigui,et al.  A Text Semantic Similarity Approach for Arabic Paraphrase Detection , 2017, CICLing.

[9]  Kai Ming Ting,et al.  Precision and Recall , 2017, Encyclopedia of Machine Learning and Data Mining.

[10]  Chris Callison-Burch,et al.  Extracting Lexically Divergent Paraphrases from Twitter , 2014, TACL.

[11]  Heri Ramampiaro,et al.  A Deep Network Model for Paraphrase Detection in Short Text Messages , 2017, Inf. Process. Manag..

[12]  Peter W. Culicover,et al.  Paraphrase generation and information retrieval from stored text , 1968, Mech. Transl. Comput. Linguistics.

[13]  Sharvari Govilkar,et al.  A Survey on Paraphrase Detection Techniques for Indian Regional Languages , 2017 .

[14]  Arafat Awajan,et al.  Towards building arabic paraphrasing benchmark , 2019, DATA '19.

[15]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[16]  Kai Ming Ting,et al.  Precision and Recall , 2017, Encyclopedia of Machine Learning and Data Mining.

[17]  Samuel Fernando,et al.  A Semantic Similarity Approach to Paraphrase Detection , 2008 .

[18]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[19]  Sivaji Bandyopadhyay,et al.  TEXTUAL ENTAILMENT USING LEXICAL AND SYNTACTIC SIMILARITY , 2011 .

[20]  Peter J. L. Wallis,et al.  Information Retrieval based on Paraphrase , 1993 .

[21]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[22]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[23]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[24]  Weijian Lin,et al.  Neural Symbolic Arabic Paraphrasing with Automatic Evaluation , 2018 .

[25]  Emiel Krahmer,et al.  Dependency-based paraphrasing for recognizing textual entailment , 2007, ACL-PASCAL@ACL.

[26]  Timothy W. Finin,et al.  Ebiquity: Paraphrase and Semantic Similarity in Twitter using Skipgrams , 2015, *SEMEVAL.

[27]  Ngoc Phuoc An Vo,et al.  Paraphrase Identification and Semantic Similarity in Twitter with Simple Features , 2015, SocialNLP@NAACL.

[28]  Mahmoud Al-Ayyoub,et al.  Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features , 2017, Inf. Process. Manag..

[29]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[30]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[31]  R. S. Milton,et al.  Paraphrase identification in short texts using grammar patterns , 2013, 2013 International Conference on Recent Trends in Information Technology (ICRTIT).