AdaSum: an adaptive model for summarization

Topic representation mismatch is a key problem in topic-oriented summarization for the specified topic is usually too short to understand/interpret. This paper proposes a novel adaptive model for summarization, AdaSum, under the assumption that the summary and the topic representation can be mutually boosted. AdaSum aims to simultaneously optimize the topic representation and extract effective summaries. This model employs a mutual boosting process to minimize the topic representation mismatch for base summarizers. Furthermore, a linear combination of base summarizers is proposed to further reduce the topic representation mismatch from the diversity of base summarizers with a general learning framework. We prove that the training process of AdaSum can enhance the performance measure used. Experimental results on DUC 2007 dataset show that AdaSum significantly outperforms the baseline methods for summarization (e.g. MRP, LexRank, and GSPS).

[1]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[2]  M. Sanderson Book Reviews: Advances in Automatic Text Summarization , 2000, Computational Linguistics.

[3]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[4]  Mario A. Nascimento,et al.  Information-content based sentence extraction for text summarization , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[7]  Jin Zhang,et al.  GSPSummary: A Graph-Based Sub-topic Partition Algorithm for Summarization , 2008, AIRS.

[8]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[9]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[10]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[11]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[12]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[13]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[14]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[15]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[16]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[17]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[18]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[19]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[20]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[21]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[22]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[23]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.