Search result diversity for informational queries

Ambiguous queries constitute a significant fraction of search instances and pose real challenges to web search engines. With current approaches the top results for these queries tend to be homogeneous, making it difficult for users interested in less popular aspects to find relevant documents. While existing research in search diversification offers several solutions for introducing variety into the results, the majority of such work is predicated, implicitly or otherwise, on the assumption that a single relevant document will fulfill a user's information need, making them inadequate for many informational queries. In this paper we present a search-diversification algorithm particularly suitable for informational queries by explicitly modeling that the user may need more than one page to satisfy their need. This modeling enables our algorithm to make a well-informed tradeoff between a user's desire for multiple relevant documents, probabilistic information about an average user's interest in the subtopics of a multifaceted query, and uncertainty in classifying documents into those subtopics. We evaluate the effectiveness of our algorithm against commercial search engine results and other modern ranking strategies, demonstrating notable improvement in multiple document scenarios.

[1]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[2]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[3]  Alexander Pretschner,et al.  Ontology based personalized search , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[4]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[5]  Vipin Kumar,et al.  Expert agreement and content based reranking in a meta search environment using Mearf , 2002, WWW '02.

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[8]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[9]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[10]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[14]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[15]  Ellen M. Voorhees,et al.  Overview of TREC 2004 , 2004, TREC.

[16]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[17]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[18]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[19]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[20]  Zhenyu Liu,et al.  Analysis of User Web Traffic with A Focus on Search Activities , 2005, WebDB.

[21]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[22]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[23]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.