Using Topic Shifts for Focussed Access to XML Repositories

In focussed XML retrieval, a retrieval unit is an XML element that not only contains information relevant to a user query, but also is specific to the query. INEX defines a relevant element to be at the right level of granularity if it is exhaustive and specific to the user's request - i.e., it discusses fully the topic requested in the user's query and no other topics. The exhaustivity and specificity dimensions are both expressed in terms of the "quantity" of topics discussed within each element. We therefore propose to use the number of topic shifts in an XML element, to express the "quantity" of topics discussed in an element as a mean to capture specificity. We experimented with a number of element-specific smoothing methods within the language modelling framework. These methods enable us to adjust the amount of smoothing required for each XML element depending on its number of topic shifts, to capture specificity. Using the number of topic shifts combined with element length improves retrieval effectiveness, thus indicating that the number of topic shifts is a useful evidence in focussed XML retrieval.

[1]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[2]  James P. Callan,et al.  Hierarchical Language Models for XML Component Retrieval , 2004, INEX.

[3]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[4]  Gabriella Kazai,et al.  Advances in XML Information Retrieval and Evaluation, 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl Castle, Germany, November 28-30, 2005, Revised Selected Papers , 2006, INEX.

[5]  Mounia Lalmas,et al.  Report on the INEX 2003 Workshop, Schloss Dagstuhl, 15-17 December 2003 , 2004 .

[6]  Djoerd Hiemstra,et al.  Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[7]  M. de Rijke,et al.  Generating and Retrieving Text Segments for Focused Access to Scientific Documents , 2006, ECIR.

[8]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[9]  Thijs Westerveld,et al.  Using Structural Relationships for Focused XML Retrieval , 2006, FQAS.

[10]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[11]  Mounia Lalmas,et al.  Examining topic shifts in content-oriented XML retrieval , 2007, International Journal on Digital Libraries.

[12]  Sameer Pradhan,et al.  Evaluation Metrics , 2007 .

[13]  Djoerd Hiemstra,et al.  TIJAH Scratches INEX 2005: Vague Element Selection, Image Search, Overlap, and Relevance Feedback , 2005, INEX.

[14]  M. de Rijke,et al.  The Importance of Length Normalization for XML Retrieval , 2005, Information Retrieval.

[15]  Mounia Lalmas,et al.  Report on the INEX 2003 workshop , 2004, SIGF.

[16]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .