Diversifying Query Suggestions by Using Topics from Wikipedia

Diversifying query suggestions has emerged recently, by which the recommended queries can be both relevant and diverse. Most existing works diversify suggestions by query log analysis, however, for structured data, not all query logs are available. To this end, this paper studies the problem of suggesting diverse query terms by using topics from Wikipedia. Wikipedia is a successful online encyclopedia, and has high coverage of entities and concepts. We first obtain all relevant topics from Wikipedia, and then map each term to these topics. As the mapping is a nontrivial task, we leverage information from both Wikipedia and structured data to semantically map each term to topics. Finally, we propose a fast algorithm to efficiently generate the suggestions. Extensive evaluations are conducted on a real dataset, and our approach yields promising results.

[1]  Yang Song,et al.  Post-ranking query suggestion by diversifying search results , 2011, SIGIR '11.

[2]  Louiqa Raschid,et al.  Explaining and Reformulating Authority Flow Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[4]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[5]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[6]  Nick Koudas,et al.  Measure-driven Keyword-Query Expansion , 2009, Proc. VLDB Endow..

[7]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[8]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[9]  Ophir Frieder,et al.  Query Phrase Suggestion from Topically Tagged Session Logs , 2006, FQAS.

[10]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[11]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[12]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[15]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[16]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[17]  Wei Wang,et al.  Keyword-based search and exploration on databases , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[18]  Yanchun Zhang,et al.  An efficient approach to suggesting topically related web queries using hidden topic model , 2011, World Wide Web.

[19]  Umut Ozertem,et al.  Suggestion set utility maximization using session logs , 2011, CIKM '11.

[20]  Georgia Koutrika,et al.  Data clouds: summarizing keyword search results over structured data , 2009, EDBT '09.

[21]  Yi Chen,et al.  Query Expansion Based on Clustered Results , 2011, Proc. VLDB Endow..

[22]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[23]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[24]  Junjie Yao,et al.  Keyword Query Reformulation on Structured Data , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[25]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[26]  Michael R. Lyu,et al.  Diversifying Query Suggestion Results , 2010, AAAI.

[27]  Yufei Tao,et al.  Finding frequent co-occurring terms in relational keyword search , 2009, EDBT '09.

[28]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[29]  Christopher Olston,et al.  Search result diversity for informational queries , 2011, WWW.

[30]  Jeffrey Xu Yu,et al.  Context-Based Diversification for Keyword Queries Over XML Data , 2015, IEEE Transactions on Knowledge and Data Engineering.

[31]  Gerhard Weikum,et al.  Efficient and self-tuning incremental query expansion for top-k query processing , 2005, SIGIR '05.

[32]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[33]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.