Automatic text summarization: A comprehensive survey

Abstract Automatic Text Summarization (ATS) is becoming much more important because of the huge amount of textual content that grows exponentially on the Internet and the various archives of news articles, scientific papers, legal documents, etc. Manual text summarization consumes a lot of time, effort, cost, and even becomes impractical with the gigantic amount of textual content. Researchers have been trying to improve ATS techniques since the 1950s. ATS approaches are either extractive, abstractive, or hybrid. The extractive approach selects the most important sentences in the input document(s) then concatenates them to form the summary. The abstractive approach represents the input document(s) in an intermediate representation then generates the summary with sentences that are different than the original sentences. The hybrid approach combines both the extractive and abstractive approaches. Despite all the proposed methods, the generated summaries are still far away from the human-generated summaries. Most researches focus on the extractive approach. It is required to focus more on the abstractive and hybrid approaches. This research provides a comprehensive survey for the researchers by presenting the different aspects of ATS: approaches, methods, building blocks, techniques, datasets, evaluation methods, and future research directions.

[1]  Yanjun Wu,et al.  Deep reinforcement learning for extractive document summarization , 2018, Neurocomputing.

[2]  Kun Li,et al.  Event-based Summarization for Scientific Literature in Chinese , 2017, IIKI.

[3]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[4]  Naomie Salim,et al.  A framework for multi-document abstractive summarization based on semantic role labelling , 2015, Appl. Soft Comput..

[5]  Ebru Akcapinar Sezer,et al.  Multi-document extractive text summarization: A comparative assessment on features , 2019, Knowl. Based Syst..

[6]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[7]  Elena Lloret,et al.  COMPENDIUM: A text summarization system for generating abstracts of research papers , 2013, Data Knowl. Eng..

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Hassan Khotanlou,et al.  Fuzzy evolutionary cellular learning automata model for text summarization , 2016, Swarm Evol. Comput..

[10]  Miguel A. Vega-Rodríguez,et al.  Experimental analysis of multiple criteria for extractive multi-document text summarization , 2020, Expert Syst. Appl..

[11]  Iren Valova,et al.  Gist: general integrated summarization of text and reviews , 2019, Soft Comput..

[12]  Maria Fernanda Moura,et al.  Latent association rule cluster based model to extract topics for classification and recommendation applications , 2018, Expert Syst. Appl..

[13]  Josiane Mothe,et al.  A survey on evaluation of summarization methods , 2019, Inf. Process. Manag..

[14]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[15]  G. Altmann,et al.  The Oxford Handbook of Psycholinguistics , 2007 .

[16]  Ahmad T. Al-Taani,et al.  Arabic Single-Document Text Summarization Using Particle Swarm Optimization Algorithm , 2017, ACLING.

[17]  Yogesh Kumar Meena,et al.  Evolutionary Algorithms for Extractive Automatic Text Summarization , 2015 .

[18]  Subhankar Ghosh,et al.  Text summarization using Wikipedia , 2014, Inf. Process. Manag..

[19]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[20]  Amal Ganesh,et al.  A Study on Ontology Based Abstractive Summarization , 2016 .

[21]  Muzaffar Bashir Shah,et al.  Text document summarization using word embedding , 2020, Expert Syst. Appl..

[22]  Sukomal Pal,et al.  Text summarization from legal documents: a survey , 2019, Artificial Intelligence Review.

[23]  Luca Cagliero,et al.  GraphSum: Discovering correlations among multiple terms for graph-based summarization , 2013, Inf. Sci..

[24]  K. Umamaheswari,et al.  Enhanced continuous and discrete multi objective particle swarm optimization for text summarization , 2018, Cluster Computing.

[25]  S. K. Gupta,et al.  Abstractive summarization: An overview of the state of the art , 2019, Expert Syst. Appl..

[26]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[27]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[28]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[29]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[30]  Dogan Ibrahim,et al.  An Overview of Soft Computing , 2016 .

[31]  Udo Kruschwitz,et al.  Creating language resources for under-resourced languages: methodologies, and experiments with Arabic , 2015, Lang. Resour. Evaluation.

[32]  Mohamed El Bachir Menai,et al.  Automatic Arabic text summarization: a survey , 2015, Artificial Intelligence Review.

[33]  Mahmood Yousefi-Azar,et al.  Text summarization using unsupervised deep learning , 2017, Expert Syst. Appl..

[34]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[35]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[36]  W. Bruce Croft,et al.  Document Summarization for Answering Non-Factoid Queries , 2018, IEEE Transactions on Knowledge and Data Engineering.

[37]  Mourad Oussalah,et al.  SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis , 2019, Inf. Process. Manag..

[38]  Qasem A. Al-Radaideh,et al.  A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms , 2018, Cognitive Computation.

[39]  Nasser Ghadiri,et al.  Graph-based biomedical text summarization: An itemset mining and sentence clustering approach , 2018, J. Biomed. Informatics.

[40]  Prasenjit Mitra,et al.  AlgorithmSeer: A System for Extracting and Searching for Algorithms in Scholarly Big Data , 2016, IEEE Transactions on Big Data.

[41]  Anna Kazantseva,et al.  Summarizing Short Stories , 2010, CL.

[42]  A. Kumar,et al.  Systematic literature review of fuzzy logic based text summarization , 2019 .

[43]  Di Wang,et al.  Automatic Arabic Summarization: A survey of methodologies and systems , 2017, ACLING.

[44]  Rajendra Kumar Roul,et al.  A nifty review to text summarization-based recommendation system for electronic products , 2019, Soft Comput..

[45]  Miguel A. Vega-Rodríguez,et al.  A decomposition-based multi-objective optimization approach for extractive multi-document text summarization , 2020, Appl. Soft Comput..

[46]  Awais Ahmad,et al.  Abstractive Text Summarization based on Improved Semantic Graph Approach , 2018, International Journal of Parallel Programming.

[47]  Syed Waqar Jaffry,et al.  Textual keyword extraction and summarization: State-of-the-art , 2019, Inf. Process. Manag..

[48]  H. Mamata Devi,et al.  Document representation techniques and their effect on the document Clustering and Classification: A Review , 2017 .

[49]  Mikhail Petrovskiy,et al.  Automatic text summarization using latent semantic analysis , 2011, Programming and Computer Software.

[50]  Hyoil Han,et al.  The use of domain-specific concepts in biomedical text summarization , 2007, Inf. Process. Manag..

[51]  Alaa Hamouda,et al.  A survey of multiple types of text summarization with their satellite contents based on swarm intelligence optimization algorithms , 2019, Knowl. Based Syst..

[52]  Mark T. Maybury,et al.  Generating Summaries from Event Data , 1995, Inf. Process. Manag..

[53]  Rakesh Chandra Balabantaray,et al.  Hybrid Approach To Abstractive Summarization , 2018 .

[54]  ManiInderjeet,et al.  The Challenges of Automatic Summarization , 2000 .

[55]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[56]  Prasenjit Majumder,et al.  Effective aggregation of various summarization techniques , 2018, Inf. Process. Manag..

[57]  Elena Lloret,et al.  The challenging task of summary evaluation: an overview , 2017, Language Resources and Evaluation.

[58]  Miguel A. Vega-Rodríguez,et al.  Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach , 2017, Knowl. Based Syst..

[59]  Youngjoong Ko,et al.  An effective sentence-extraction technique using contextual information and statistical approaches for text summarization , 2008, Pattern Recognition Letters.

[60]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[61]  Shaobin Huang,et al.  Extractive summarization using supervised and unsupervised learning , 2019, Expert Syst. Appl..

[62]  Abdelmajid Ben Hamadou,et al.  Enhancing the sentence similarity measure by semantic and syntactico-semantic knowledge , 2017, Vietnam Journal of Computer Science.

[63]  Rupali Wagh,et al.  Effective deep learning approaches for summarization of legal texts , 2019, J. King Saud Univ. Comput. Inf. Sci..

[64]  Zuping Zhang,et al.  An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization , 2018, ArXiv.

[65]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[66]  Mark Last,et al.  An unsupervised constrained optimization approach to compressive summarization , 2020, Inf. Sci..

[67]  Jiajun Zhang,et al.  Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video , 2019, IEEE Transactions on Knowledge and Data Engineering.

[68]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[69]  Xiaojun Wan,et al.  CMiner: Opinion Extraction and Summarization for Chinese Microblogs , 2016, IEEE Transactions on Knowledge and Data Engineering.

[70]  Nazli Goharian,et al.  Scientific document summarization via citation contextualization and scientific discourse , 2017, International Journal on Digital Libraries.

[71]  Utpal Garain,et al.  A novel method for performance evaluation of text chunking , 2015, Lang. Resour. Evaluation.

[72]  Juan-Manuel Torres-Moreno,et al.  Compressive approaches for cross-language multi-document summarization , 2020, Data Knowl. Eng..

[73]  Zongda Wu,et al.  A topic modeling based approach to novel document automatic summarization , 2017, Expert Syst. Appl..

[74]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[75]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[76]  Roshni Chakraborty,et al.  Tweet Summarization of News Articles: An Objective Ordering-Based Perspective , 2019, IEEE Transactions on Computational Social Systems.

[77]  Saurabh Shah,et al.  Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique , 2019, Expert Syst. Appl..

[78]  Dejun Mu,et al.  Word-sentence co-ranking for automatic extractive text summarization , 2017, Expert Syst. Appl..

[79]  Mohammed Meknassi,et al.  Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning , 2019, Expert Syst. Appl..

[80]  Niloy Ganguly,et al.  Summarizing Situational Tweets in Crisis Scenarios: An Extractive-Abstractive Approach , 2019, IEEE Transactions on Computational Social Systems.

[81]  Rasim M. Alguliyev,et al.  COSUM: Text summarization based on clustering and optimization , 2018, Expert Syst. J. Knowl. Eng..

[82]  Jaroslaw Sobieszczanski-Sobieski,et al.  Particle swarm optimization , 2002 .