Summarisation of the logical structure of XML documents

Summarisation is traditionally used to produce summaries of the textual contents of documents. In this paper, it is argued that summarisation methods can also be applied to the logical structure of XML documents. Structure summarisation selects the most important elements of the logical structure and ensures that the user's attention is focused towards sections, subsections, etc. that are believed to be of particular interest. Structure summaries are shown to users as hierarchical tables of contents. This paper discusses methods for structure summarisation that use various features of XML elements in order to select document portions that a user's attention should be focused to. An evaluation methodology for structure summarisation is also introduced and summarisation results using various summariser versions are presented and compared to one another. We show that data sets used in information retrieval evaluation can be used effectively in order to produce high quality (query independent) structure summaries. We also discuss the choice and effectiveness of particular summariser features with respect to several evaluation measures.

[1]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[2]  Gabriella Kazai,et al.  Advances in XML Information Retrieval and Evaluation: 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl ... Papers (Lecture Notes in Computer Science) , 2006 .

[3]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[4]  Andrew Trotman,et al.  Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Schloss Dagstuhl, Germany , 2008 .

[5]  G. Kazai,et al.  USERS’ PERSPECTIVES ON THE USEFULNESS OF STRUCTURE FOR XML INFORMATION RETRIEVAL , 2007 .

[6]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[7]  Norbert Fuhr,et al.  Designing a User Interface for Interactive Retrieval of Structured Documents - Lessons Learned from the INEX Interactive Track , 2006, ECDL.

[8]  Ricardo Baeza-Yates,et al.  Structured Document Retrieval , 2009, Encyclopedia of Database Systems.

[9]  Mounia Lalmas,et al.  Evaluating XML retrieval effectiveness at INEX , 2007, SIGF.

[10]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[11]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[12]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[13]  David A. Nation WebTOC: a tool to visualize and quantify Web sites using a hierarchical table of contents browser , 1998, CHI Conference Summary.

[14]  Birger Larsen,et al.  Report on the INEX 2005 interactive track , 2007, SIGF.

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  Jaap Kamps,et al.  Evaluating relevant in context: document retrieval with a twist , 2007, SIGIR.

[17]  Zoltán Szlávik,et al.  Content and structure summarisation for accessing XML documents , 2009, SIGF.

[18]  Marc M. Sebrechts,et al.  Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces , 1999, SIGIR '99.

[19]  Gabriella Kazai,et al.  Overview of INEX 2005 , 2005, INEX.

[20]  E. D. Giorgi Selected Papers , 2006 .

[21]  Norbert Fuhr,et al.  Query Formulation and Result Visualization for XML Retrieval , 2002 .

[22]  Mounia Lalmas,et al.  The Use of Summaries in XML Retrieval , 2006, ECDL.

[23]  Ludovic Denoyer,et al.  The XML Wikipedia Corpus , 2006 .

[24]  Inderjeet Mani,et al.  Summarization Evaluation: An Overview , 2001, NTCIR.

[25]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[26]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[27]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[28]  Mounia Lalmas,et al.  Investigating the use of summarisation for interactive XML retrieval , 2006, SAC.

[29]  Andrew Trotman,et al.  Comparative Evaluation of Focused Retrieval , 2010, Lecture Notes in Computer Science.

[30]  Ian Witten,et al.  Data Mining , 2000 .

[31]  W. Bruce Croft,et al.  Language models for hierarchical summarization , 2003 .

[32]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[33]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[34]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[35]  Maarten de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[36]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[37]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[38]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[39]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[40]  Fuji Ren,et al.  GA, MR, FFNN, PNN and GMM based models for automatic text summarization , 2009, Comput. Speech Lang..

[41]  Mounia Lalmas,et al.  Feature- and Query-Based Table of Contents Generation for XML Documents , 2007, ECIR.

[42]  Birger Larsen,et al.  Users, structured documents and overlap: interactive searching of elements and the influence of context on search behaviour , 2006, IIiX.