A probabilistic model to exploit user expectations in XML information retrieval

XML information retrieval models return nested elements as result to a user query.User explores only elements that he expects to be relevant.The elements structural characteristics (tag, position) attract the user attention.Important elements must be boosted according to their structural context.Element importance as prior probability improves retrieval effectiveness. The main objective of this paper is to exploit a new source of evidence derived from the document hierarchical structure for XML information retrieval. We consider that the structure of XML document is an important source of prior knowledge, and the structural features of an element may influence the user to consider that element as relevant. We build a probabilistic model to estimate the probability that the structural characteristics of an element attract user to explore the content of this element and consider it as relevant. This probability reflects the context importance. We propose a simple, well-motivated probabilistic model to estimate the context importance. Finally, we demonstrate the effectiveness of the context importance through comprehensive experimental studies carried out on IEEE XML document collection. Experimental results show that the proposed approach outperforms models exploiting other sources of evidence. Display Omitted

[1]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[2]  Seyed M. M. Tahaghoghi,et al.  RMIT University at INEX 2005: Ad Hoc Track , 2005, INEX.

[3]  Kevyn Collins-Thompson,et al.  Initial Results with Structured Queries and Language Models on Half a Terabyte of Text , 2004, TREC.

[4]  Roi Blanco,et al.  Probabilistic Document Length Priors for Language Models , 2008, ECIR.

[5]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[6]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[7]  Mounia Lalmas,et al.  Examining topic shifts in content-oriented XML retrieval , 2007, International Journal on Digital Libraries.

[8]  Thomas Beckers,et al.  Using Eye-Tracking for the Evaluation of Interactive Information Retrieval , 2010, INEX.

[9]  Marianne Winslett,et al.  Using structural information in XML keyword search effectively , 2011, TODS.

[10]  Jaap Kamps,et al.  Exploiting the category structure of Wikipedia for entity ranking , 2013, Artif. Intell..

[11]  Mohand Boughanem,et al.  Document Priors Based On Time-Sensitive Social Signals , 2015, ECIR.

[12]  Jaana Kekäläinen,et al.  Contextualization models for XML retrieval , 2011, Inf. Process. Manag..

[13]  James P. Callan,et al.  Parameter Estimation for a Simple Hierarchical Generative Model for XML Retrieval , 2005, INEX.

[14]  Fang Huang Using Language Models and Topic Models for XML Retrieval , 2007, INEX.

[15]  Norbert Fuhr,et al.  Using eye-tracking with dynamic areas of interest for analyzing interactive information retrieval , 2012, SIGIR '12.

[16]  Thijs Westerveld,et al.  Structural features in content oriented XML retrieval , 2005, CIKM '05.

[17]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[18]  Andrew Trotman,et al.  Comparative Evaluation of XML Information Retrieval Systems: 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006 Dagstuhl Castle, Germany, December 17-20, 2006 Revised and Selected Papers , 2005 .

[19]  Richard M. Schwartz,et al.  BBN at TREC7: Using Hidden Markov Models for Information Retrieval , 1998, TREC.

[20]  Michel Beigbeder,et al.  ENSM-SE and UJM at INEX 2010: Scoring with Proximity and Tag Weights , 2010, INEX.

[21]  Juan D. Velásquez,et al.  Combining eye-tracking technologies with web usage mining for identifying Website Keyobjects , 2013, Eng. Appl. Artif. Intell..

[22]  M. de Rijke,et al.  Cognitive Temporal Document Priors , 2013, DIR.

[23]  Robert Stevens,et al.  How people use presentation to search for a link: expanding the understanding of accessibility on the Web , 2006, W4A '06.

[24]  Gareth J. F. Jones,et al.  DCU and ISI@INEX 2010: Adhoc and Data-Centric Tracks , 2010, INEX.

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[27]  Mohand Boughanem,et al.  Effectiveness of state-of-the-art features for microblog search , 2013, SAC '13.

[28]  Djoerd Hiemstra,et al.  Retrieving Web Pages Using Content, Links, URLs and Anchors , 2001, TREC.

[29]  Hyoil Han,et al.  Language Modeling Approaches to Information Retrieval , 2009, J. Comput. Sci. Eng..

[30]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2018, Encyclopedia of Database Systems.

[31]  Philipp Dopichaj The University of Kaiserslautern at INEX 2006 , 2006, INEX.

[32]  Philipp Dopichaj The University of Kaiserslautern at INEX 2005 , 2005, INEX.

[33]  Anastasio Tombros,et al.  Comparative Evaluation of XML Information Retrieval Systems, 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Dagstuhl Castle, Germany, December 17-20, 2006, Revised and Selected Papers , 2007, INEX.

[34]  Armin B. Cremers,et al.  Beyond the Web: Retrieval in Social Information Spaces , 2006, ECIR.

[35]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[36]  Djoerd Hiemstra,et al.  TIJAH Scratches INEX 2005: Vague Element Selection, Image Search, Overlap, and Relevance Feedback , 2005, INEX.

[37]  Fang Huang,et al.  Compact Representations in XML Retrieval , 2006, INEX.

[38]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[39]  Mathias Géry,et al.  BM25t: a BM25 extension for focused information retrieval , 2012, Knowledge and Information Systems.

[40]  Meredith Ringel Morris,et al.  What do you see when you're surfing?: using eye tracking to predict salient regions of web pages , 2009, CHI.

[41]  Jaap Kamps,et al.  Using Collaborative Filtering in Social Book Search , 2012, CLEF.

[42]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[43]  Jaap Kamps,et al.  Using Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking , 2010, TREC.

[44]  Börkur Sigurbjörnsson,et al.  Focused information access using XML element retrieval , 2006 .

[45]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[46]  Paul Ogilvie,et al.  Using Language Models for Flat Text Queries in XML Retrieval , 2003 .

[47]  James P. Callan,et al.  Hierarchical Language Models for XML Component Retrieval , 2004, INEX.