Query subtopic mining for search result diversification

Web search queries are usually short, ambiguous, and contain multiple aspects or subtopics. Different users may have different search intents (or information needs) when submitting the same query. The task of identifying the subtopics underlying a query has received much attention in recent years. In this paper, we propose a method that exploits query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query subtopics. In this regard, we estimate the importance of the subtopics by introducing multiple query-dependent and query-independent features, and rank the subtopics by balancing relevancy and novelty. Our experiment with the NTCIR-10 INTENT-2 English Subtopic Mining test collection shows that our method outperforms all participants' methods in NTCIR-10 INTENT-2 task in terms of D#-nDCG@10.

[1]  Robert Krovetz Viewing morphology as an inference process , 2000, Artif. Intell..

[2]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[3]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[4]  Giorgio Gambosi,et al.  FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track , 2008, TREC.

[5]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[6]  Yiqun Liu,et al.  Overview of the NTCIR-10 INTENT-2 Task , 2013, NTCIR.

[7]  Robert L. Grossman,et al.  Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining , 2005, KDD 2005.

[8]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[9]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[10]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[11]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[12]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[13]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[14]  Gaël Dias,et al.  HULTECH at the NTCIR-10 INTENT-2 Task: Discovering User Intents through Search Results Clustering , 2013, NTCIR.

[15]  Yong Yu,et al.  Identifying ambiguous queries in web search , 2007, WWW '07.

[16]  Nattiya Kanhabua,et al.  Leveraging Dynamic Query Subtopics for Time-Aware Search Result Diversification , 2014, ECIR.

[17]  Se-Jong Kim,et al.  The KLE's Subtopic Mining System for the NTCIR-10 INTENT-2 Task , 2013, NTCIR.

[18]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[19]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[20]  Min-Yen Kan,et al.  Functional Faceted Web Query Analysis , 2007 .

[21]  Tetsuya Sakai RD-004 NTCIREVAL : A Generic Toolkit for Information Access Evaluation , 2011 .

[22]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[23]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[24]  Gang Wang,et al.  Understanding user's query intent with wikipedia , 2009, WWW '09.

[25]  Hsin-Hsi Chen,et al.  Mining subtopics from different aspects for diversifying search results , 2012, Information Retrieval.

[26]  Filip Radlinski,et al.  Inferring query intent from reformulations and clicks , 2010, WWW '10.

[27]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[28]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[29]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[30]  Masaki Aono,et al.  SEM12 at the NTCIR-10 INTENT-2 English Subtopic Mining Subtask , 2013, NTCIR.

[31]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[32]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[33]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[34]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[35]  Fan Zhang,et al.  Mining subtopics from text fragments for a web query , 2013, Information Retrieval.

[36]  Craig MacDonald,et al.  University of Glasgow at TREC 2010: Experiments with Terrier in Blog and Web Tracks , 2010, TREC.

[37]  Qiang Zhou,et al.  Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task , 2013, NTCIR.

[38]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[39]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[40]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[41]  Yiqun Liu,et al.  Improve Web Search Diversification with Intent Subtopic Mining , 2013, NLPCC.

[42]  Se-Jong Kim,et al.  The KLE's Subtopic Mining System for the NTCIR-11 IMine Task , 2014, NTCIR.