Comparative Evaluation of Term-Weighting Methods for Automatic Summarization*

Abstract Term-based summarization assumes that it is possible to determine the importance of a sentence on the basis of the words it contains. To achieve this, words are weighted using term-weighting measures which in turn are used to weight the sentences. This article presents a comparative evaluation of summaries produced using different term-weighting measures and different combinations of parameters which are used to calculate these measures. Comparative evaluation of summaries produced reveals that in many cases simple methods such as term frequency can produce informative summaries.

[1]  Geoffrey Sampson,et al.  The Oxford Handbook of Computational Linguistics , 2003, Lit. Linguistic Comput..

[2]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[3]  hierarchyDunja Mladeni Feature Selection for Classiication Based on Text Hierarchy , 1998 .

[4]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[5]  Robert L. Donaway,et al.  A Comparison of Rankings Produced by Summarization Evaluation Measures , 2000 .

[6]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[7]  Ray R. Larson The U.C. Berkeley School of Information Management and Systems , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[8]  Chin-Yew Lin Assembly of Topic Extraction Modules in SUMMARIST , 1998 .

[9]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[10]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[11]  Donna K. Harman,et al.  How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[12]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[13]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[14]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[15]  Kenneth Ward Church One term or two? , 1995, SIGIR '95.

[16]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[17]  Constantin Orasan,et al.  A Comparison of Summarisation Methods Based on Term Specificity Estimation , 2004, LREC.

[18]  Dou Shen Text Summarization , 2009, Encyclopedia of Database Systems.

[19]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[20]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[21]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[22]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[23]  Ellen Riloff,et al.  Little words can make a big difference for text classification , 1995, SIGIR '95.

[24]  Michael McGill,et al.  A performance evaluation of similarity measures, document term weighting schemes and representations in a Boolean environment , 1980, SIGIR '80.

[25]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[26]  Klaus Zechner,et al.  Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences , 1996, COLING.

[27]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[28]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[29]  Mark Stevenson,et al.  The Reuters Corpus Volume 1 -from Yesterday’s News to Tomorrow’s Language Resources , 2002, LREC.

[30]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[31]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[32]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[33]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[34]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[35]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[36]  Nikolaos Nanas,et al.  A Comparative Study of Term Weighting Methods for Information Filtering , .

[37]  Lou Burnard The Text Encoding Initiative: A progress report , 1992 .

[38]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[39]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.