Extraction of Relevant Figures and Tables for Multi-document Summarization

We propose a system that extracts the most relevant figures and tables from a set of topically related source documents. These are then integrated into the extractive text summary produced using the same set. The proposed method is domain independent. It predominantly focuses on the generation of a ranked list of relevant candidate units (figures/tables), in order of their computed relevancy. The relevancy measure is based on local and global scores that include direct and indirect references. In order to test the system performance, we have created a test collection of document sets which do not adhere to any specific domain. Evaluation experiments show that the system generated ranked list is in statistically significant correlation with the human evaluators' ranking judgments. Feasibility of the proposed system to summarize a document set which contains figures/tables as their salient units is made clear in our concluding remark.

[1]  James Ze Wang,et al.  Deriving knowledge from figures for digital libraries , 2007, WWW '07.

[2]  Robert P. Futrelle Handling Figures in Document Summarization , 2004 .

[3]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[4]  Jiawei Han,et al.  Aggregation of Multiple Judgments for Evaluating Ordered Lists , 2010, ECIR.

[5]  Hong Yu,et al.  Figure summarizer browser extensions for PubMed Central , 2011, Bioinform..

[6]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[7]  William R. Hersh,et al.  Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries , 2002 .

[8]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[9]  Robert P. Futrelle,et al.  Summarization of Diagrams in Documents , 1999 .

[10]  Hong Yu,et al.  FigSum: Automatically Generating Structured Text Summaries for Figures in Biomedical Literature , 2009, AMIA.

[11]  Eduard Hovy,et al.  The Potential and Limitations of Automatic Sentence Extraction for Summarization , 2003, HLT-NAACL 2003.

[12]  Ahmet Aker,et al.  Evaluating automatically generated user-focused multi-document summaries for geo-referenced images , 2008, COLING 2008.

[13]  Mohsen Amini Salehi,et al.  A Comprehensive Survey on Text Summarization Systems , 2009, 2009 2nd International Conference on Computer Science and its Applications.

[14]  Hong Yu,et al.  Automatic Figure Ranking and User Interfacing for Intelligent Figure Search , 2010, PloS one.

[15]  Hong Yu,et al.  Accessing bioscience images from abstract sentences , 2006, ISMB.

[16]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[17]  Shibamouli Lahiri,et al.  Generating synopses for document-element search , 2009, CIKM.

[18]  Kun Bai,et al.  Automatic extraction of table metadata from digital documents , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[19]  Hui Hui Wang,et al.  Image Retrieval: Techniques, Challenge, and Trend , 2009 .

[20]  黄亚明 PubMed Central , 2009 .

[21]  Panagiotis Stamatopoulos,et al.  Summarization from Medical Documents: A Survey , 2005, Artif. Intell. Medicine.