Hierarchical Summarization: Scaling Up Multi-Document Summarization

Multi-document summarization (MDS) systems have been designed for short, unstructured summaries of 10-15 documents, and are inadequate for larger document collections. We propose a new approach to scaling up summarization called hierarchical summarization, and present the first implemented system, SUMMA. SUMMA produces a hierarchy of relatively short summaries, in which the top level provides a general overview and users can navigate the hierarchy to drill down for more details on topics of interest. SUMMA optimizes for coherence as well as coverage of salient information. In an Amazon Mechanical Turk evaluation, users prefered SUMMA ten times as often as flat MDS and three times as often as timelines.

[1]  Murat Ali Bayir,et al.  Identifying breakpoints in public opinion , 2010, SOMA '10.

[2]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[3]  Oren Etzioni,et al.  Towards Coherent Multi-Document Summarization , 2013, NAACL.

[4]  Christopher C. Yang,et al.  Fractal summarization: summarization based on fractal theory , 2003, SIGIR '03.

[5]  W. Bruce Croft,et al.  Language models for hierarchical summarization , 2003 .

[6]  Dafna Shahaf,et al.  Trains of thought: generating information maps , 2012, WWW.

[7]  Peng Xu,et al.  Generating Breakpoint-based Timeline Overview for News Topic Retrospection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[9]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[10]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[11]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[12]  Yinglin Wang,et al.  Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining , 2010, ACL.

[13]  Isamu Shioya,et al.  Hierarchical Summarizing and Evaluating for Web Pages , 2007, EROW.

[14]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[15]  Yan Zhang,et al.  Timeline Generation through Evolutionary Trans-Temporal Summarization , 2011, EMNLP.

[16]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[17]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[18]  Yan Zhang,et al.  Evolutionary timeline summarization: a balanced optimization framework via iterative substitution , 2011, SIGIR.

[19]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[20]  Arnold L. Rosenberg,et al.  Finding topic words for hierarchical summarization , 2001, SIGIR '01.

[21]  Christopher C. Yang,et al.  Multi-document Summarization for Terrorism Information Extraction , 2006, ISI.

[22]  Dafna Shahaf,et al.  Connecting the dots between news articles , 2011, IJCAI 2011.

[23]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[24]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[25]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[26]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[27]  André Bittar,et al.  Finding Salient Dates for Building Thematic Timelines , 2012, ACL.

[28]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[29]  Xu Jia,et al.  Autopedia: automatic domain-independent Wikipedia article generation , 2011, WWW.

[30]  Qin Lu,et al.  An Integrated Multi-document Summarization Approach based on Word Hierarchical Representation , 2009, ACL/IJCNLP.

[31]  Ben Taskar,et al.  Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[32]  Horacio Saggion,et al.  Multi-document summarization by cluster/prole relevance and redundancy removal , 2004 .

[33]  Dragomir R. Radev,et al.  News to go: hierarchical text summarization for mobile devices , 2006, SIGIR '06.

[34]  Hai Leong Chieu,et al.  Query based event extraction along a timeline , 2004, SIGIR '04.