论文信息 - A Survey of Text Summarization Techniques

A Survey of Text Summarization Techniques

Numerous approaches for identifying important content for automatic text summarization have been developed to date. Topic representation approaches first derive an intermediate representation of the text that captures the topics discussed in the input. Based on these representations of topics, sentences in the input document are scored for importance. In contrast, in indicator representation approaches, the text is represented by a diverse set of possible indicators of importance which do not aim at discovering topicality. These indicators are combined, very often using machine learning techniques, to score the importance of each sentence. Finally, a summary is produced by selecting sentences in a greedy approach, choosing the sentences that will go in the summary one by one, or globally optimizing the selection, choosing the best set of sentences to form a summary. In this chapter we give a broad overview of existing approaches based on these distinctions, with particular attention on how representation, sentence scoring or summary selection strategies alter the overall performance of the summarizer. We also point out some of the peculiarities of the task of summarization which have posed challenges to machine learning approaches for the problem, and some of the suggested solutions.

Ani Nenkova | Kathleen McKeown | K. McKeown | A. Nenkova

[1] Joshua Goodman,et al. Multi-Document Summarization by Maximizing Informative Content-Words , 2007, IJCAI.

[2] Liang Zhou,et al. Multi-Document Biography Summarization , 2005, EMNLP.

[3] Berlin Chen,et al. Leveraging evaluation metric-related training criteria for speech summarization , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Manuel J. Maña López,et al. Multidocument summarization: An added value to clustering in interactive retrieval , 2004, TOIS.

[5] Shafiq R. Joty,et al. Improving the Performance of the Random Walk Model for Answering Complex Questions , 2008, ACL.

[6] Gustave J. Rath,et al. The formation of abstracts by the selection of sentences , 1961 .

[7] Kam-Fai Wong,et al. Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[8] Owen Rambow,et al. Using Question-Answer Pairs in Extractive Summarization of Email Conversations , 2007, CICLing.

[9] Horacio Rodríguez,et al. Support Vector Machines for Query-focused Summarization trained and evaluated on Pyramid data , 2007, ACL.

[10] Ahmet Aker,et al. Multi-document summarization using A * search and discriminative training , 2013 .

[11] Hugh E. Williams,et al. Fast generation of result snippets in web search , 2007, SIGIR.

[12] Robert L. Donaway,et al. A Comparison of Rankings Produced by Summarization Evaluation Measures , 2000 .

[13] Jinxi Xu,et al. A Hybrid Approach to Answering Biographical Questions , 2004, New Directions in Question Answering.

[14] Terry COPECK,et al. Leveraging Pyramids , 2005 .

[15] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[16] Giuseppe Carenini,et al. Summarizing email conversations with clue words , 2007, WWW '07.

[17] Wai Lam,et al. Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[18] Kathleen McKeown,et al. Improving Word Sense Disambiguation in Lexical Chaining , 2003, IJCAI.

[19] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[20] Ani Nenkova,et al. A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[21] Regina Barzilay,et al. Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[22] Marc Moens,et al. Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[23] Ahmet Aker,et al. Multi-Document Summarization Using A* Search and Discriminative Learning , 2010, EMNLP.

[24] Miles Osborne,et al. Using maximum entropy for sentence extraction , 2002, ACL 2002.

[25] M. Litzow,et al. Evolving paradigms in the therapy of Philadelphia-chromosome-negative acute lymphoblastic leukemia in adults. , 2009, Hematology. American Society of Hematology. Education Program.

[26] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[27] Ani Nenkova,et al. Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[28] Mark T. Maybury,et al. Automatic Summarization , 2002, Computational Linguistics.

[29] Xiaojun Wan,et al. Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[30] Julia Hirschberg,et al. An Unsupervised Approach to Biography Production Using Wikipedia , 2008, ACL.

[31] Eduard Hovy,et al. Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[32] Kathleen McKeown,et al. DefScriber: a hybrid system for definitional QA , 2003, SIGIR '03.

[33] Jean Carletta,et al. Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[34] Hongyan Jing. Using hidden Markov modeling to decompose human-written summaries : Summarization , 2002 .

[35] G. Carenini,et al. A Publicly Available Annotated Corpus for Supervised Email Summarization , 2008 .

[36] H. P. Edmundson,et al. New Methods in Automatic Extracting , 1969, JACM.

[37] David Reitter,et al. Dimensionality Reduction Aids Term Co-Occurrence Based Multi-Document Summarization , 2006 .

[38] Pascale Fung,et al. One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization , 2006, TSLP.

[39] Ani Nenkova,et al. Syntactic Simplification for Improving Content Selection in Multi-Document Summarization , 2004, COLING.

[40] Daniel Marcu,et al. A Phrase-Based HMM Approach to Document/Abstract Alignment , 2004, EMNLP.

[41] Vasileios Hatzivassiloglou,et al. A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[42] Hua Li,et al. Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[43] Dragomir R. Radev,et al. Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[44] Inderjeet Mani,et al. Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[45] Liang Zhou,et al. A Web-Trained Extraction Summarization System , 2003, NAACL.

[46] Hui Lin,et al. Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[47] Dragomir R. Radev,et al. LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[48] Gökhan Tür,et al. Statistical Sentence Extraction for Information Distillation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[49] Ani Nenkova,et al. Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization , 2007, ACL.

[50] Akira Shimazu,et al. Construction of Deliberation Structure in E‐Mail Communication , 2000, Comput. Intell..

[51] Ferda Nur Alpaslan,et al. Text Summarization of Turkish Texts using Latent Semantic Analysis , 2010, COLING.

[52] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[53] Dilek Z. Hakkani-Tür,et al. A global optimization framework for meeting summarization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54] Kathleen McKeown,et al. Detection of Question-Answer Pairs in Email Conversations , 2004, COLING.

[55] Kathleen R. McKeown,et al. SIMFINDER: A Flexible Clustering Tool for Summarization , 2001 .

[56] Ani Nenkova,et al. Automatically Evaluating Content Selection in Summarization without Human Models , 2009, EMNLP.

[57] Ani Nenkova,et al. Facilitating email thread access by extractive summary generation , 2003, RANLP.

[58] Sanda M. Harabagiu,et al. Topic themes for multi-document summarization , 2005, SIGIR '05.

[59] Inderjeet Mani,et al. Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[60] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .