An integrated approach for measuring semantic similarity between words and sentences using web search engine

Semantic similarity measures play vital roles in Information Retrieval (IR) and Natural Language Processing (NLP). Despite the usefulness of semantic similarity measures in various applications, strongly measuring semantic similarity between two words remains a challenging task. Here, three semantic similarity measures have been proposed, that uses the information available on the web to measure similarity between words and sentences. The proposed method exploits page counts and text snippets returned by a web search engine. We develop indirect associations of words, in addition to direct for estimating their similarity. Evaluation results on different data sets shows that our methods outperform several competing methods.

[1]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[5]  Raymond S. T. Lee,et al.  Text Information Retrieval , 2011 .

[6]  Muazzam Siddiqui,et al.  A corpus based approach to find similar keywords for search engine marketing , 2013, Int. Arab J. Inf. Technol..

[7]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[8]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[9]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[10]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[11]  Nazean Binti Jomhari,et al.  The International Arab Journal of Information Technology , 2011 .

[12]  Filippo Menczer,et al.  Algorithmic detection of semantic similarity , 2005, WWW '05.

[13]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[14]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[15]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[16]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[17]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[18]  Charles T. Meadow,et al.  Text information retrieval systems , 1992 .

[19]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .