Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization

We investigate eighteen shallow sentence scoring techniques and ensemble strategies.Experiments were performed in several datasets for single- and multi-document task.Ensemble strategies lead to improvements over the individual scoring techniques.Ensembles that perform competitively against the state-of-the-art were identified. The volume of text data has been growing exponentially in the last years, mainly due to the Internet. Automatic Text Summarization has emerged as an alternative to help users find relevant information in the content of one or more documents. This paper presents a comparative analysis of eighteen shallow sentence scoring techniques to compute the importance of a sentence in the context of extractive single- and multi-document summarization. Several experiments were made to assess the performance of such techniques individually and applying different combination strategies. The most traditional benchmark on the news domain demonstrates the feasibility of combining such techniques, in most cases outperforming the results obtained by isolated techniques. Combinations that perform competitively with the state-of-the-art systems were found.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[3]  Lucia Helena Machado Rino,et al.  Combining Multiple Features for Automatic Text Summarization through Machine Learning , 2008, PROPOR.

[4]  George D. C. Cavalcanti,et al.  Assessing sentence scoring techniques for extractive text summarization , 2013, Expert Syst. Appl..

[5]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[6]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[7]  Mohamed Abdel Fattah A hybrid machine learning model for multi-document summarization , 2013, Applied Intelligence.

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  Jimmy J. Lin,et al.  Multi-candidate reduction: Sentence compression as a tool for document summarization tasks , 2007, Inf. Process. Manag..

[10]  Naomie Salim,et al.  An Improved Evolutionary Algorithm for Extractive Text Summarization , 2013, ACIIDS.

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Charles F. Greenbacker Towards a Framework for Abstractive Summarization of Multimodal Documents , 2011, ACL.

[13]  Miodrag Lovric,et al.  International Encyclopedia of Statistical Science , 2011 .

[14]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[15]  Prasenjit Mitra,et al.  Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression , 2015, IJCAI.

[16]  Dianne P. O'Leary,et al.  Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.

[17]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[18]  Rakesh M. Verma,et al.  Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization , 2012, CICLing.

[19]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[20]  Naomie Salim,et al.  Text summarization features selection method using pseudo Genetic-based model , 2012, 2012 International Conference on Information Retrieval & Knowledge Management.

[21]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[24]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[25]  Benoît Favre,et al.  Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions , 2015, EMNLP.

[26]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[27]  Kai Hong,et al.  System Combination for Multi-document Summarization , 2015, EMNLP.

[28]  Naomie Salim,et al.  Swarm based features selection for text summarization , 2009 .

[29]  Steven J. Simske Meta-Algorithmics: Patterns for Robust, Low Cost, High Quality Systems , 2013 .

[30]  Alex Alves Freitas,et al.  Automatic Text Summarization Using a Machine Learning Approach , 2002, SBIA.

[31]  Jaime G. Carbonell,et al.  Exploring events and distributed representations of text in multi-document summarization , 2016, Knowl. Based Syst..

[32]  Yogesh Kumar Meena,et al.  Optimal Features Set for Extractive Automatic Text Summarization , 2015, 2015 Fifth International Conference on Advanced Computing & Communication Technologies.

[33]  Naomie Salim,et al.  A framework for multi-document abstractive summarization based on semantic role labelling , 2015, Appl. Soft Comput..

[34]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization , 2014 .

[35]  Fuji Ren,et al.  GA, MR, FFNN, PNN and GMM based models for automatic text summarization , 2009, Comput. Speech Lang..

[36]  Thorsten Joachims,et al.  Large-Margin Learning of Submodular Summarization Models , 2012, EACL.

[37]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[38]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[39]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[40]  Luca Cagliero,et al.  Multi-document summarization based on the Yago ontology , 2013, Expert Syst. Appl..

[41]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[42]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[43]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[44]  Qin Lu,et al.  A Study on Position Information in Document Summarization , 2010, COLING.

[45]  Giang Binh Tran Structured summarization for news events , 2013, WWW '13 Companion.

[46]  Mausam,et al.  Hierarchical Summarization: Scaling Up Multi-Document Summarization , 2014, ACL.

[47]  Noah A. Smith,et al.  Toward Abstractive Summarization Using Semantic Representations , 2018, NAACL.

[48]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[49]  Andy Way,et al.  Sentence Similarity-Based Source Context Modelling in PBSMT , 2010, 2010 International Conference on Asian Language Processing.

[50]  Yogesh Kumar Meena,et al.  Analysis of Sentence Scoring Methods for Extractive Automatic Text Summarization , 2014, ICTCS '14.

[51]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[52]  Rafael Dueire Lins,et al.  Automatic Text Document Summarization Based on Machine Learning , 2015, DocEng.

[53]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[54]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[55]  Girish Keshav Palshikar,et al.  Combining Summaries Using Unsupervised Rank Aggregation , 2012, CICLing.

[56]  Rafael Dueire Lins,et al.  A Quantitative and Qualitative Assessment of Automatic Text Summarization Systems , 2015, DocEng.

[57]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[58]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[59]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[60]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[61]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[62]  Jan Snajder,et al.  Event graphs for information retrieval and multi-document summarization , 2014, Expert Syst. Appl..

[63]  Lin Zhao,et al.  Using External Resources and Joint Learning for Bigram Weighting in ILP-Based Multi-Document Summarization , 2015, NAACL.

[64]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[65]  Rafael Dueire Lins,et al.  A Context Based Text Summarization System , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[66]  Hui Lin,et al.  A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization , 2014, LREC.

[67]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[68]  Elena Lloret,et al.  Text summarisation in progress: a literature review , 2011, Artificial Intelligence Review.