A Query Expansion Algorithm Based on Phrases Semantic Similarity

During the indexing process of traditional search engine, web pages become a list of terms, but single term cannot represent the rich content of web pages, which makes information retrieval methods mainly based on terms matching often result in depressed precision. This paper proposes a novel query expansion technique that has phrases as its expansion unit. Phrases typically have a higher information content and a smaller degree of ambiguity than their constituent words, and therefore represent the concepts expressed in text more accurately than single terms. This method extracts key phrases from original results, and calculates the semantic similarity between the query phrase and each phrase extracted using the semantic similarity algorithm based on WordNet, and then expands the query with the most similar phrases to search again. Experimental results show that the proposed algorithm can provide more precision than the traditional query expansion methods.

[1]  Rudolf Kruse,et al.  Relevance Feedback for Association Rules by Leveraging Concepts from Information Retrieval , 2007, SGAI Conf..

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[4]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[5]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[6]  Feng Lin,et al.  Using Query Expansion and Classification for Information Retrieval , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[7]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[8]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[9]  M. Shamim Khan,et al.  Enhanced Web document retrieval using automatic query expansion , 2004, J. Assoc. Inf. Sci. Technol..

[10]  Olga Vechtomova,et al.  Noun phrases in interactive query expansion and document ranking , 2006, Information Retrieval.

[11]  Lin Ya-ping Information-retrieval Algorithm Based on Query Expansion and Classification , 2006 .

[12]  Chung Keung Poon,et al.  Efficient Phrase Querying with Common Phrase Index , 2006, ECIR.

[13]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[15]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[16]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .