Feature- and Query-Based Table of Contents Generation for XML Documents

The availability of a document's logical structure in XML retrieval allows retrieval systems to return document portions (elements) instead of whole documents. This helps searchers focusing their attention to the relevant content within a document. However, other, e.g. sibling or parent, elements of retrieved elements may also be important as they provide context to the retrieved elements. The use of table of contents (TOC) offers an overview of a document and shows the most important elements and their relations to each other. In this paper, we investigate what searchers think is important in automatic TOC generation. We ask searchers to indicate their preferences for element features (depth, length, relevance) in order to generate TOCs that help them complete information seeking tasks. We investigate what these preferences are, and what are the characteristics of the TOCs generated by searchers' settings. The results have implications for the design of intelligent TOC generation approaches for XML retrieval.

[1]  Gabriella Kazai,et al.  INEX 2005 Multimedia Track , 2005, INEX.

[2]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[3]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[4]  Birger Larsen,et al.  Users, structured documents and overlap: interactive searching of elements and the influence of context on search behaviour , 2006, IIiX.

[5]  Birger Larsen,et al.  The Interactive Track at INEX 2005 , 2005, INEX.

[6]  Gabriella Kazai,et al.  Advances in XML Information Retrieval and Evaluation, 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl Castle, Germany, November 28-30, 2005, Revised Selected Papers , 2006, INEX.

[7]  Heesop Kim,et al.  Users Interaction with the Hierarchically Structured Presentation in XML Document Retrieval , 2005, INEX.

[8]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[9]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[10]  Birger Larsen,et al.  The Interactive Track at INEX 2004 , 2004, INEX.

[11]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[12]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[13]  Norbert Fuhr,et al.  Designing a User Interface for Interactive Retrieval of Structured Documents - Lessons Learned from the INEX Interactive Track , 2006, ECDL.

[14]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[15]  Mounia Lalmas,et al.  The Use of Summaries in XML Retrieval , 2006, ECDL.

[16]  Börkur Sigurbjörnsson,et al.  Focused information access using XML element retrieval , 2006 .

[17]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[18]  Mounia Lalmas,et al.  Investigating the use of summarisation for interactive XML retrieval , 2006, SAC.

[19]  Jaap Kamps,et al.  What Do Users Think of an XML Element Retrieval System? , 2005, INEX.