An Ensemble Similarity Model for Short Text Retrieval

The rapid growth of World Wide Web has extended Information Retrieval related technology such as queries for information needs become more easily accessible. One such platform is online question answering (QA). Online community can posting questions and get direct response for their special information needs using various platforms. It creates large unorganized repositories of valuable knowledge resources. Effective QA retrieval is required to make these repositories accessible to fulfill users information requests quickly. The repositories might contained similar questions and answer to users newly asked question. This paper explores the similarity-based models for the QA system to rank search result candidates. We used Damerau-Levenshtein distance and cosine similarity model to obtain ranking scores between the question posted by the registered user and a similar candidate questions in repository. Empirical experimental results indicate that our proposed ensemble models are very encouraging and give a significantly better similarity value to improve search ranking results.

[1]  Hayley Watson,et al.  Analysing social media data for disaster preparedness: Understanding the opportunities and barriers faced by humanitarian actors , 2017 .

[2]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[3]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[4]  Mário J. Silva,et al.  Spelling Correction for Search Engine Queries , 2004, EsTAL.

[5]  Huizhong Duan,et al.  Online spelling correction for query completion , 2011, WWW.

[6]  Shuigeng Zhou,et al.  Effectively classifying short texts by structured sparse representation with dictionary filtering , 2015, Inf. Sci..

[7]  Thomas Demeester,et al.  Learning Semantic Similarity for Very Short Texts , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[8]  Hicham Gueddah,et al.  Adaptating the Levenshtein Distance to Contextual Spelling Correction , 2015, Int. J. Comput. Sci. Appl..

[9]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[10]  ChengXiang Zhai,et al.  A generalized hidden Markov model with discriminative training for query spelling correction , 2012, SIGIR '12.

[11]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Susan T. Dumais,et al.  Similarity Measures for Short Segments of Text , 2007, ECIR.

[13]  Gregory V. Bard,et al.  Spelling-Error Tolerant, Order-Independent Pass-Phrases via the Damerau-Levenshtein String-Edit Distance Metric , 2007, ACSW.

[14]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[15]  Idan Szpektor,et al.  Learning from the past: answering new questions with past answers , 2012, WWW.

[16]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[17]  Nazlia Omar,et al.  Semantic Similarity Measures for Malay Sentences , 2007, ICADL.

[18]  Harith Alani,et al.  On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter , 2014, LREC.

[19]  Luis Alfonso Ureña López,et al.  SINAI: Machine Learning and Emotion of the Crowd for Sentiment Analysis in Microblogs , 2013, *SEMEVAL.

[20]  Wael Hassan Gomaa,et al.  A Survey of Text Similarity Approaches , 2013 .

[21]  Hao Chen,et al.  String Metrics and Word Similarity applied to Information Retrieval , 2012 .

[22]  Harith Alani,et al.  Semantic Patterns for Sentiment Analysis of Twitter , 2014, SEMWEB.

[23]  Tiago A. Almeida,et al.  Short text opinion detection using ensemble of classifiers and semantic indexing , 2016, Expert Syst. Appl..