A ranking framework and evaluation for diversity-based retrieval

There has been growing momentum in building information retrieval (IR) systems that consider both relevance and diversity of retrieved information, which together improve the usefulness of search results as perceived by users. Some users may genuinely require a set of multiple results to satisfy their information need as there is no single result that completely fulfils the need. Others may be uncertain about their information need and they may submit ambiguous or broad (faceted) queries, either intentionally or unintentionally. A sensible approach to tackle these problems is to diversify search results to address all possible senses underlying those queries or all possible answers satisfying the information need. In this thesis, we explore three aspects of diversity-based document retrieval: 1) recommender systems, 2) retrieval algorithms, and 3) evaluation measures. This first goal of this thesis is to provide an understanding of the need for diversity in search results from the users’ perspective. We develop an interactive recommender system for the purpose of a user study. Designed to facilitate users engaged in exploratory search, the system is featured with content-based browsing, aspectual interfaces, and diverse recommendations. While the diverse recommendations allow users to discover more and different aspects of a search topic, the aspectual interfaces allow users to manage and structure their own search process and results regarding aspects found during browsing. The recommendation feature mines implicit relevance feedback information extracted from a user’s browsing trails and diversifies recommended results with respect to document contents. The result of our user-centred experiment shows that result diversity is needed in realistic retrieval scenarios. Next, we propose a new ranking framework for promoting diversity in a ranked list. We combine two distinct result diversification patterns; this leads to a general framework that enables the development of a variety of ranking algorithms for diversifying documents. To validate our proposal and to gain more insights into approaches for diversifying documents, we empirically compare our integration framework against a common ranking approach (i.e. the probability ranking principle) as well as several diversity-based ranking strategies. These include maximal marginal relevance, modern portfolio theory, and sub-topic-aware diversification based on sub-topic modelling techniques, e.g. clustering, latent Dirichlet allocation, and probabilistic latent semantic analysis. Our findings show that the two diversification patterns can be employed together to improve the effectiveness of ranking diversification. Furthermore, we find that the effectiveness of our framework mainly depends on the effectiveness of the underlying sub-topic modelling techniques. Finally, we examine evaluation measures for diversity retrieval. We analytically identify an issue affecting the de-facto standard measure, novelty-biased discounted cumulative gain (α-nDCG). This issue prevents the measure from behaving as desired, i.e. assessing the effectiveness of systems that provide complete coverage of sub-topics by avoiding excessive redundancy. We show that this issue is of importance as it highly affects the evaluation of retrieval systems, specifically by overrating top-ranked systems that repeatedly retrieve redundant information. To overcome this issue, we derive a theoretically sound solution by defining a safe threshold on a query-basis. We examine the impact of arbitrary settings of the α-nDCG parameter. We evaluate the intuitiveness and reliability of α-nDCG when using our proposed setting on both real and synthetic rankings. We demonstrate that the diversity of document rankings can be intuitively measured by employing the safe threshold. Moreover, our proposal does not harm, but instead increases the reliability of the measure in terms of discriminative power, stability, and sensitivity.

[1]  Guido Zuccon,et al.  Using the Quantum Probability Ranking Principle to Rank Interdependent Documents , 2010, ECIR.

[2]  Edward A. Fox,et al.  Research Contributions , 2014 .

[3]  Martin Halvey,et al.  Diversity, Assortment, Dissimilarity, Variety: A Study of Diversity Measures Using Low Level Features for Video Retrieval , 2009, ECIR.

[4]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[5]  Shuk Ying Ho,et al.  Understanding the Impact of Web Personalization on User Information Processing and Decision Outcomes , 2006, MIS Q..

[6]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[7]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[8]  Joemon M. Jose,et al.  Crowdsourcing Interactions A proposal for capturing user interactions through crowdsourcing , 2011, WSDM 2011.

[9]  Martin Halvey,et al.  Application and evaluation of multi-dimensional diversity , 2009 .

[10]  Hichem Sahbi,et al.  TELECOMParisTech at ImageClefphoto 2008: Bi-Modal Text and Image Retrieval with Diversity Enhancement , 2008, CLEF.

[11]  Yiming Yang,et al.  Utility-based information distillation over temporally sequenced documents , 2007, SIGIR.

[12]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.

[13]  Carol L. Barry,et al.  Order Effects: A Study of the Possible Influence of Presentation Order on User Judgments of Document Relevance. , 1988 .

[14]  Oren Kurland,et al.  Inter-Document Similiarities, Language Models, and Ad Hoc Information Retrieval , 2006 .

[15]  Alistair Moffat,et al.  Click-based evidence for decaying weight distributions in search effectiveness metrics , 2010, Information Retrieval.

[16]  Young-Woo Seo,et al.  Learning user's preferences by analyzing Web-browsing behaviors , 2000, AGENTS '00.

[17]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[18]  C. Lee Giles,et al.  Learning to Rank Homepages For Researcher-Name Queries , 2011 .

[19]  Gary Marchionini,et al.  Evaluating exploratory search systems: Introduction to special topic issue of information processing and management , 2008, Inf. Process. Manag..

[20]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[21]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[22]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[23]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[24]  Yong Yu,et al.  Identification of ambiguous queries in web search , 2009, Inf. Process. Manag..

[25]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[26]  Ellen M. Voorhees,et al.  The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.

[27]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[28]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[29]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[30]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[31]  Wim Vanderbauwhede,et al.  A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements , 2010, IIiX.

[32]  Ian Ruthven,et al.  Integrating approaches to relevance , 2005 .

[33]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[34]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[35]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[36]  Noriko Kando,et al.  Overview of Patent Retrieval Task at NTCIR-5 , 2005, NTCIR.

[37]  Paul P. Maglio,et al.  SUITOR: an attentive information system , 2000, IUI '00.

[38]  Don R. Swanson,et al.  Probabilistic models for automatic indexing , 1974, J. Am. Soc. Inf. Sci..

[39]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[40]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[41]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[42]  S. Robertson The probability ranking principle in IR , 1997 .

[43]  Ben Carterette,et al.  Analysis of Various Evaluation Measures for Diversity , 2011 .

[44]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[45]  Thomas S. Huang,et al.  Relevance feedback techniques in interactive content-based image retrieval , 1997, Electronic Imaging.

[46]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[47]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[48]  Joemon M. Jose,et al.  A Query-Basis Approach to Parametrizing Novelty-Biased Cumulative Gain , 2011, ICTIR.

[49]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[50]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[51]  Peter Ingwersen,et al.  Information Retrieval Interaction , 1992 .

[52]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[53]  Peter Ingwersen,et al.  The development of a method for the evaluation of interactive information retrieval systems , 1997, J. Documentation.

[54]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Carmel Domshlak,et al.  A rank-aggregation approach to searching for optimal query-specific clusters , 2008, SIGIR '08.

[56]  Micheline Hancock-Beaulieu,et al.  Interactive searching and interface issues in the Okapi best match probabilistic retrieval system , 1998, Interact. Comput..

[57]  Emine Yilmaz,et al.  Crowdsourcing interactions: Capturing query sessions through crowdsourcing , 2011 .

[58]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[59]  Joemon M. Jose,et al.  Exploring term temporality for pseudo-relevance feedback , 2011, SIGIR.

[60]  William R. Hersh,et al.  Tasks, topics and relevance judging for the TREC Genomics Track: five years of experience evaluating biomedical text information retrieval systems , 2009, Information Retrieval.

[61]  Stefan M. Rüger Keynote Talk: More than a Thousand Words , 2009, SAMT.

[62]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[63]  Mark Sanderson,et al.  Forming test collections with no system pooling , 2004, SIGIR '04.

[64]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[65]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[66]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[67]  C. Cleverdon On the Inverse Relationship of Recall and Precision. , 1972 .

[68]  Hervé Glotin,et al.  Diversifying Image Retrieval with Affinity-Propagation Clustering on Visual Manifolds , 2009, IEEE MultiMedia.

[69]  Tetsuya Sakai,et al.  Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.

[70]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[71]  Joemon M. Jose,et al.  An adaptive browsing-based approach for creating a photographic story , 2008 .

[72]  Justin Zobel,et al.  Redundant documents and search effectiveness , 2005, CIKM '05.

[73]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[74]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[75]  Jing Huang,et al.  An automatic hierarchical image classification scheme , 1998, MULTIMEDIA '98.

[76]  Bert R. Boyce,et al.  Beyond topicality : A two stage view of relevance and the retrieval process , 1982, Inf. Process. Manag..

[77]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[78]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[79]  Marti A. Hearst,et al.  Scatter/gather browsing communicates the topic structure of a very large text collection , 1996, CHI.

[80]  Ben Carterette,et al.  System effectiveness, user models, and user utility: a conceptual framework for investigation , 2011, SIGIR.

[81]  Dan Frankowski,et al.  Collaborative Filtering Recommender Systems , 2007, The Adaptive Web.

[82]  Ryen W. White,et al.  Supporting exploratory search , 2006 .

[83]  Carolyn J. Crouch,et al.  Proceedings of the 4th annual international ACM SIGIR conference on Information storage and retrieval: theoretical issues in information retrieval , 1981 .

[84]  Azer Bestavros,et al.  Sources and characteristics of Web temporal locality , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[85]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[86]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[87]  Stephen E. Robertson,et al.  Simple Evaluation Metrics for Diversified Search Results , 2010, EVIA@NTCIR.

[88]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[89]  Stephen E. Robertson,et al.  Modelling A User Population for Designing Information Retrieval Metrics , 2008, EVIA@NTCIR.

[90]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[91]  Robert Villa,et al.  An aspectual interface for supporting complex search tasks , 2009, SIGIR.

[92]  C. J. van Rijsbergen,et al.  The interactive PRP for diversifying document rankings , 2011, SIGIR '11.

[93]  Tetsuya Sakai,et al.  Evaluating diversified search results using per-intent graded relevance , 2011, SIGIR.

[94]  Peter Ingwersen,et al.  The Turn - Integration of Information Seeking and Retrieval in Context , 2005, The Kluwer International Series on Information Retrieval.

[95]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[96]  Michael D. Gordon,et al.  When is the probability ranking principle suboptimal , 1992 .

[97]  Rebecca J. Passonneau,et al.  Semantic Clustering for a Functional Text Classification Task , 2009, CICLing.

[98]  Iain Campbell,et al.  The ostensive model of developing information needs , 2000 .

[99]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[100]  Stephen E. Robertson,et al.  On the Evaluation of IR Systems , 1992, Inf. Process. Manag..

[101]  Desmond Elliott,et al.  Supporting aspect-based video browsing: analysis of a user study , 2009, CIVR '09.

[102]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[103]  Robert Villa,et al.  FacetBrowser: a user interface for complex search tasks , 2008, ACM Multimedia.

[104]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[105]  Charles L. A. Clarke,et al.  An Effectiveness Measure for Ambiguous and Underspecified Queries , 2009, ICTIR.

[106]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[107]  Olivier Buisson,et al.  Interactive components for visual exploration of multimedia archives , 2008, CIVR '08.

[108]  Joemon M. Jose,et al.  An adaptive technique for content-based image retrieval , 2006, Multimedia Tools and Applications.

[109]  Paul Over,et al.  The TREC interactive track: an annotated bibliography , 2001, Inf. Process. Manag..

[110]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[111]  Fulvio Corno,et al.  Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics , 2010 .

[112]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[113]  C. J. van Rijsbergen,et al.  The Quantum Probability Ranking Principle for Information Retrieval , 2009, ICTIR.

[114]  Hermann Ney,et al.  Jointly optimising relevance and diversity in image retrieval , 2009, CIVR '09.

[115]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[116]  Yi-Cheng Ku,et al.  Personalized Content Recommendation and User Satisfaction: Theoretical Synthesis and Empirical Findings , 2006, J. Manag. Inf. Syst..

[117]  Martin Halvey,et al.  Search trails using user feedback to improve video search , 2008, ACM Multimedia.

[118]  Stephen E. Robertson,et al.  Ambiguous requests: implications for retrieval tests, systems and theories , 2007, SIGF.

[119]  Martin Halvey,et al.  University of Glasgow at ImageCLEFPhoto 2009: Optimising Similarity and Diversity in Image Retrieval , 2009, CLEF.

[120]  B. S. Manjunath,et al.  MPEG‐7 Homogeneous Texture Descriptor , 2001 .

[121]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[122]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[123]  Peter Ingwersen,et al.  Cognitive Perspectives of Information Retrieval Interaction: Elements of a Cognitive IR Theory , 1996, J. Documentation.

[124]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[125]  Craig MacDonald,et al.  On the suitability of diversity metrics for learning-to-rank for diversity , 2011, SIGIR.

[126]  Desmond Elliott,et al.  An architecture for life-long user modelling , 2009 .

[127]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[128]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[129]  Ellen M. Voorhees TREC: Improving information access through evaluation , 2006 .

[130]  Herbert A. Simon,et al.  The Structure of Ill Structured Problems , 1973, Artif. Intell..

[131]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[132]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[133]  Fredric C. Gey,et al.  Cross-Language Information Retrieval: the way ahead , 2005, Inf. Process. Manag..

[134]  Joemon M. Jose,et al.  Technical report : a study of ranking paradigms and their integrations for subtopic retrieval , 2010 .

[135]  William Goffman,et al.  An indirect method of information retrieval , 1968, Inf. Storage Retr..

[136]  Iadh Ounis,et al.  Research directions in Terrier: a search engine for advanced retrieval on the Web , 2007 .

[137]  David Hawking,et al.  Challenges in Enterprise Search , 2004, ADC.

[138]  William W. Cohen,et al.  Next Generation Web Search : Setting Our Sites , 2000 .

[139]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[140]  Ben Carterette,et al.  An analysis of NP-completeness in novelty and diversity ranking , 2009, Information Retrieval.

[141]  Andreas Dengel,et al.  Eye movements as implicit relevance feedback , 2008, CHI Extended Abstracts.

[142]  Paul Over,et al.  TREC-8 interactive track , 1999, SIGF.

[143]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[144]  Allen Kent,et al.  Machine literature searching II. Problems in indexing for machine searching , 1954 .

[145]  Joemon M. Jose,et al.  University of Glasgow (qirdcsuog) at TREC Crowdsourcing 2011: TurkRank-Network-based Worker Ranking in Crowdsourcing , 2011, TREC.

[146]  Frank Hopfgartner,et al.  User Centred Evaluation of A Recommendation Based Image Browsing System , 2009, IICAI.

[147]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[148]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[149]  Milad Shokouhi,et al.  Expected browsing utility for web search evaluation , 2010, CIKM.

[150]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[151]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[152]  Nicholas J. Belkin,et al.  Ranking in Principle , 1978, J. Documentation.

[153]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[154]  Hong Joo Lee,et al.  Mobile push personalization and user experience , 2008, AI Commun..

[155]  James E. Bartlett,et al.  Organizational research: Determining appropriate sample size in survey research , 2001 .

[156]  Paul Clough,et al.  Developing a Test Collection to Support Diversity Analysis , 2009 .

[157]  William Goffman,et al.  On relevance as a measure , 1964, Inf. Storage Retr..

[158]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[159]  Arthur H. M. ter Hofstede,et al.  Query Formulation as an Information Retrieval Problem , 1996, Comput. J..

[160]  Iain Campbell,et al.  Interactive Evaluation of the Ostensive Model Using a New Test Collection of Images with Multiple Relevance Assessments , 2000, Information Retrieval.

[161]  Douglas W. Oard,et al.  Overview of the TREC 2010 Legal Track Notebook Draft 2010 . 10 . 25 , 2010 .

[162]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[163]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[164]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[165]  Michael D. Gordon,et al.  A Utility Theoretic Examination of the Probability Ranking Principle in Information Retrieval. , 1991 .

[166]  Joemon M. Jose,et al.  A Simulated User Study of Image Browsing Using High-Level Classification , 2009, SAMT.

[167]  Joemon M. Jose,et al.  Revisiting Sub-topic Retrieval in the ImageCLEF 2009 Photo Retrieval Task , 2010, ImageCLEF.

[168]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..