Features in Extractive Supervised Single-document Summarization: Case of Persian News

Text summarization has been one of the most challenging areas of research in NLP. Much effort has been made to overcome this challenge by using either the abstractive or extractive methods. Extractive methods are more popular, due to their simplicity compared with the more elaborate abstractive methods. In extractive approaches, the system will not generate sentences. Instead, it learns how to score sentences within the text by using some textual features and subsequently selecting those with the highest-rank. Therefore, the core objective is ranking and it highly depends on the document. This dependency has been unnoticed by many state-of-the-art solutions. In this work, the features of the document are integrated into vectors of every sentence. In this way, the system becomes informed about the context, increases the precision of the learned model and consequently produces comprehensive and brief summaries.

[1]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[2]  S. Chitrakala,et al.  A survey on abstractive text summarization , 2016, 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT).

[3]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[4]  Naomie Salim,et al.  Fuzzy Logic Based Method for Improving Text Summarization , 2009, ArXiv.

[5]  Yuji Matsumoto,et al.  Extracting Important Sentences with Support Vector Machines , 2002, COLING.

[6]  George D. C. Cavalcanti,et al.  Assessing sentence scoring techniques for extractive text summarization , 2013, Expert Syst. Appl..

[7]  Mark Last,et al.  Using Machine Learning Methods and Linguistic Features in Single-Document Extractive Summarization , 2016, DMNLP@PKDD/ECML.

[8]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[9]  Alex Alves Freitas,et al.  Automatic Text Summarization Using a Machine Learning Approach , 2002, SBIA.

[10]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[11]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[12]  Oren Etzioni,et al.  Towards Coherent Multi-Document Summarization , 2013, NAACL.

[13]  Karen Spärck Jones,et al.  Generic summaries for indexing in information retrieval , 2001, SIGIR '01.

[14]  Elena Lloret,et al.  Text summarisation in progress: a literature review , 2011, Artificial Intelligence Review.

[15]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[16]  Rakesh Chandra Balabantaray,et al.  An evolutionary framework for multi document summarization using Cuckoo search approach: MDSCSA , 2018 .

[17]  Rakesh M. Verma,et al.  Document Map and WN-SUM : A New Framework for Automatic Text Summarization and a First Implementation , 2010 .

[18]  J. Kennedy,et al.  Food technology international : Edited by A. Turner, Sterling Publications, London, 1987. 308 pp. Distributed free to senior personnel in the Food Industry. ISSN 0950 4435, Price: £20.00 , 1988 .

[19]  Mohsen Kahani,et al.  Pasokh: A standard corpus for the evaluation of Persian text summarizers , 2013, ICCKE 2013.

[20]  Rafael Dueire Lins,et al.  A Context Based Text Summarization System , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[21]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[22]  Rafael Dueire Lins,et al.  Automatic Text Document Summarization Based on Machine Learning , 2015, DocEng.

[23]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[24]  Mehrnoush Shamsfard,et al.  Similarity versus relatedness: A novel approach in extractive Persian document summarisation , 2018, J. Inf. Sci..

[25]  Vagelis Hristidis,et al.  A system for query-specific document summarization , 2006, CIKM '06.

[26]  Hsinchun Chen,et al.  Information navigation on the web by clustering and summarizing query results , 2001, Inf. Process. Manag..

[27]  Alex A. Freitas,et al.  Document Clustering and Text Summarization , 2000 .

[28]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[29]  Aarti Patil,et al.  Automatic Text Summarization , 2015 .

[30]  Julia Hirschberg,et al.  Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization , 2005 .

[31]  Manuel de Buenaga,et al.  Multidocument summarization: An added value to clustering in interactive retrieval , 2004 .

[32]  Guilherme Del Fiol,et al.  Text summarization in the biomedical domain: A systematic review of recent research , 2014, J. Biomed. Informatics.

[33]  Ferda Nur Alpaslan,et al.  Text summarization using Latent Semantic Analysis , 2011, J. Inf. Sci..

[34]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[35]  Manuel J. Maña López,et al.  Multidocument summarization: An added value to clustering in interactive retrieval , 2004, TOIS.

[36]  Rakesh M. Verma,et al.  Combining Syntax and Semantics for Automatic Extractive Single-Document Summarization , 2012, CICLing.

[37]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[38]  Dejun Mu,et al.  Word-sentence co-ranking for automatic extractive text summarization , 2017, Expert Syst. Appl..

[39]  Mohamed Abdel Fattah A hybrid machine learning model for multi-document summarization , 2013, Applied Intelligence.