ANFIS-Based Model for Improved Paraphrase Rating Prediction

Paraphrase rating is an important problem with very interesting applications in plagiarism detection, language translation, text summarization, question answering, web search and information retrieval. In this paper, we present an adaptive neuro-fuzzy inference system (ANFIS) based model for automatic rating of semantic equivalence of pairs of sentences. Using a corpus of human-judged sentence pairs, lexical similarity metrics are first computed. Then, a model is constructed for predicting the mean of the rates assigned by a number of human beings. The correlation with the actual ratings and the prediction errors are studied for individual metrics as well as the model output using a nonlinear logistic regression function. The experimental results showed that much higher correlations and low error rates can be achieved with the proposed method compared to those obtained with individual metrics.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[3]  Nitin Madnani,et al.  Re-examining Machine Translation Metrics for Paraphrase Identification , 2012, NAACL.

[4]  Samarjit Kar,et al.  Applications of neuro fuzzy systems: A brief review and future outline , 2014, Appl. Soft Comput..

[5]  Eleazar Eskin,et al.  Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[6]  Nitesh V. Chawla,et al.  Empirical comparison of correlation measures and pruning levels in complex networks representing the global climate system , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[7]  Eduard H. Hovy,et al.  Squibs: What Is a Paraphrase? , 2013, CL.

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[10]  Eiichiro Sumita,et al.  Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence , 2005, IJCNLP.

[11]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[12]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[13]  P. Brazdil,et al.  A Metric for Paraphrase Detection , 2007, 2007 International Multi-Conference on Computing in the Global Information Technology (ICCGI'07).

[14]  Prodromos Malakasiotis,et al.  Paraphrase Recognition Using Machine Learning to Combine Similarity Measures , 2009, ACL.

[15]  Ion Androutsopoulos,et al.  A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..