Exploiting query reformulations for web search result diversification

When a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.

[1]  Farzin Maghoul,et al.  Query clustering using click-through graph , 2009, WWW '09.

[2]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[3]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[4]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[5]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[6]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[7]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[8]  Mark Sanderson,et al.  Ambiguous requests: implications for retrieval tests , 2007 .

[9]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[10]  Giorgio Gambosi,et al.  FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track , 2008, TREC.

[11]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[12]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[13]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[14]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[15]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[16]  Craig MacDonald,et al.  University of Glasgow at TREC 2008: Experiments in Blog, Enterprise, and Relevance Feedback Tracks with Terrier , 2008, TREC.

[17]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[18]  J. Davenport Editor , 1960 .

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[21]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[22]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[23]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[24]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[25]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[26]  Francesco Bonchi,et al.  From "Dango" to "Japanese Cakes": Query Reformulation Models and Patterns , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[27]  Ben Carterette,et al.  An analysis of NP-completeness in novelty and diversity ranking , 2009, Information Retrieval.

[28]  Marti A. Hearst Search User Interfaces , 2009 .

[29]  William Goffman,et al.  On relevance as a measure , 1964, Inf. Storage Retr..

[30]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[31]  S. Robertson The probability ranking principle in IR , 1997 .

[32]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[33]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .