Hierarchical Location and Topic Based Query Expansion

In this paper, we propose a novel approach to expand queries by exploring both location information and topic information of the queries. Users at different locations tend to have different vocabularies, while the different expressions coming from different vocabularies may relate to the same topics. Thus these expressions are identified as location sensitive and can be used for query expansion. We propose a hierarchical query expansion model, which employs a two-level SVM classification model to classify queries as location sensitive or location non-sensitive, where the former are further classified into same location sensitive and different location sensitive. For the location sensitive queries, we propose an LDA based topic-level query similarity measure to rank the list of similar queries. Experiments with 2G raw log data from CiteSeer and Excite show that our hierarchical classification model predicts the query location sensitivity with more than 80% precision and that the final search result is significantly better than existing query expansion methods.

[1]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[2]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[3]  Hugh E. Williams,et al.  Query expansion using associated queries , 2003, CIKM '03.

[4]  Wei-Ying Ma,et al.  Optimizing web search using web click-through data , 2004, CIKM '04.

[5]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[6]  Jian-Yun Nie,et al.  Query expansion using term relationships in language models for information retrieval , 2005, CIKM '05.

[7]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[8]  M. Kendall Rank Correlation Methods , 1949 .

[9]  Kotagiri Ramamohanarao,et al.  Hybrid pre-query term expansion using latent semantic analysis , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[10]  Prasenjit Mitra,et al.  Resolving Terminological Heterogeneity In Ontologies , 2002 .

[11]  Joemon M. Jose,et al.  Automatic query expansion based on divergence , 2001, CIKM '01.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[14]  C. F. Kossack,et al.  Rank Correlation Methods , 1949 .

[15]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[16]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..