Classification of Sentence Ranking Methods for Multi-Document Summarization

Modern information technology allows text information to be produced and disseminated at a very rapid pace. This situation leads to the problem of information overload, in which users are faced with a very large body of text that is relevant to an information need and no efficient and effective way to locate within the body of text the specific information that is needed. In one example of such a scenario, a user might be given a collection of digital news articles relevant to a particular current event and may need to rapidly generate a summary of the essential information relevant to the event contained in those articles. In extractive MDS, the most fundamental task is to select a subset of the sentences in the input document set in order to form a summary of the document set. An essential component of this task is sentence ranking, in which sentences from the original document set are ranked in order of importance for inclusion in a summary. The purpose of this chapter is to give an analysis of the most successful methods for sentence ranking that have been employed in recent MDS work. To this end, the authors classify sentence ranking methods into six classes and present/discuss specific approaches within each class. Sean Sovine Marshall University, USA Hyoil Han Marshall University, USA

[1]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[2]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[3]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[4]  Dianne P. O'Leary,et al.  Guiding CLASSY Toward More Responsive Summaries , 2010, TAC.

[5]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[6]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[9]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[10]  Ani Nenkova,et al.  Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference , 2005, AAAI.

[11]  Paul Over,et al.  The Effects of Human Variation in DUC Summarization Evaluation , 2004 .

[12]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[13]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[14]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[15]  Balaraman Ravindran,et al.  Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[17]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[18]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[19]  Xavier Carreras,et al.  Semantic Role Labeling: An Introduction to the Special Issue , 2008, Computational Linguistics.

[20]  Sanda M. Harabagiu,et al.  Using topic themes for multi-document summarization , 2010, TOIS.

[21]  Michael Gamon,et al.  The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[22]  D. Kalman A Singularly Valuable Decomposition: The SVD of a Matrix , 1996 .

[23]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.