A Supervised Aggregation Framework for Multi-Document Summarization

In most summarization approaches, sentence ranking plays a vital role. Most previous work explored different features and combined them into unified ranking methods. However, it would be imprecise to rank sentences from a single point of view because contributions from the features are onefold in these methods. In this paper, a novel supervised aggregation approach for summarization is proposed which combines different summarization methods including LexPageRank, LexHITS, manifold-ranking method and DivRank. Human labeled data are used to train an optimization model which combines these multiple summarizers and then the weights assigned to each individual summarizer are learned. Experiments are conducted on DUC2004 data set and the results demonstrate the effectiveness of the supervised aggregation method compared with typical ensemble approaches. In addition, we also investigate the influence of training data construction and component diversity on the summarization results.

[1]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Tao Li,et al.  Many are better than one: improving multi-document summarization via weighted consensus , 2010, SIGIR '10.

[4]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[5]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[6]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[7]  Tao Li,et al.  Learning to Rank for Query-Focused Multi-document Summarization , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[9]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[10]  Qin Lu,et al.  A Study on Position Information in Document Summarization , 2010, COLING.

[11]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[12]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[13]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[14]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[15]  Xiaoyan Zhu,et al.  A Comparative Study on Ranking and Selection Strategies for Multi-Document Summarization , 2010, COLING.

[16]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[17]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[18]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[19]  Fei Wang,et al.  Semi-supervised ranking aggregation , 2008, CIKM '08.

[20]  Wenjie Li,et al.  Developing learning strategies for topic-based summarization , 2007, CIKM '07.

[21]  Tao Qin,et al.  Supervised rank aggregation , 2007, WWW '07.

[22]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[23]  Furu Wei,et al.  Co-Feedback Ranking for Query-Focused Summarization , 2009, ACL.

[24]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[25]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[26]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[27]  Tao Li,et al.  Weighted consensus multi-document summarization , 2012, Inf. Process. Manag..

[28]  Lambert Schomaker,et al.  Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[29]  Rong Jin,et al.  Semi-Supervised Ensemble Ranking , 2008, AAAI.

[30]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[31]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[32]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[33]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[34]  Dan Roth,et al.  An Unsupervised Learning Algorithm for Rank Aggregation , 2007, ECML.