Textual keyword extraction and summarization: State-of-the-art

Abstract With the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results.

[1]  Chao Wu,et al.  KeyphraseDS: Automatic generation of survey by exploiting keyphrase information , 2017, Neurocomputing.

[2]  K. Kaikhah Automatic text summarization with neural networks , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[3]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[4]  Kamal Sarkar,et al.  An approach to summarizing Bengali news documents , 2012, ICACCI '12.

[5]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[6]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[7]  Ping Shi,et al.  An approach to automatic summarization for Chinese text based on the combination of spectral clustering and LexRank , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[8]  Laurent Romary,et al.  HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID , 2010, *SEMEVAL.

[9]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[10]  Sheikh Abujar,et al.  A heuristic approach of text summarization for Bengali documentation , 2017, 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[11]  Jee-Hyong Lee,et al.  Latent Keyphrase Extraction Using LDA Model , 2015 .

[12]  Mark Liberman,et al.  Corpora for topic detection and tracking , 2002 .

[13]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[14]  Wei You,et al.  An automatic keyphrase extraction system for scientific documents , 2012, Knowledge and Information Systems.

[15]  Jee-Hyong Lee,et al.  Latent Keyphrase Extraction Using Deep Belief Networks , 2015, Int. J. Fuzzy Log. Intell. Syst..

[16]  Abraham Kandel,et al.  DegExt: a language-independent keyphrase extractor , 2013, J. Ambient Intell. Humaniz. Comput..

[17]  Fuji Ren,et al.  A study on cross-language text summarization using supervised methods , 2009, 2009 International Conference on Natural Language Processing and Knowledge Engineering.

[18]  Chunguo Wu,et al.  Machine Learning-Based Keywords Extraction for Scientific Literature , 2007, J. Univers. Comput. Sci..

[19]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[20]  Fragkiskos D. Malliaros,et al.  Graph-based term weighting for text categorization , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[21]  Hassan H. Alrehamy,et al.  Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction , 2018, Soft Comput..

[22]  Thomas Demeester,et al.  Creation and evaluation of large keyphrase extraction collections with multiple opinions , 2018, Lang. Resour. Evaluation.

[23]  Xiaoli Li,et al.  MIKE: Keyphrase Extraction by Integrating Multidimensional Information , 2017, CIKM.

[24]  Gábor Berend,et al.  SZTERGAK : Feature Engineering for Keyphrase Extraction , 2010, *SEMEVAL.

[25]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[26]  Mahmood Yousefi-Azar,et al.  Text summarization using unsupervised deep learning , 2017, Expert Syst. Appl..

[27]  Xiao-Zhao Xing,et al.  A Keyword Extraction Method for Chinese Scientific Abstracts , 2017, WCNA 2017.

[28]  Donna Harman,et al.  The Text REtrieval Conferences (TRECs) , 1996, TIPSTER.

[29]  Ahmed A. Rafea,et al.  KP-Miner: A keyphrase extraction system for English and Arabic documents , 2009, Inf. Syst..

[30]  Mukesh A. Zaveri,et al.  Heuristics based automatic text summarization of unstructured text , 2011, ICWET.

[31]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[32]  Taeho Jo Neural Based Approach to Keyword Extraction from Documents , 2003, ICCSA.

[33]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[34]  K. Srinathan,et al.  Automatic keyphrase extraction from scientific documents using N-gram filtration technique , 2008, ACM Symposium on Document Engineering.

[35]  Paolo Rosso,et al.  Automatic Text Summarization based on Betweenness Centrality , 2018, CERI.

[36]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[37]  Sungho Shin,et al.  Contextual keyword extraction by building sentences with crowdsourcing , 2012, Multimedia Tools and Applications.

[38]  Ana Mestrovic,et al.  Selectivity-Based Keyword Extraction Method , 2016, Int. J. Semantic Web Inf. Syst..

[39]  Liana Ermakova Automatic summary evaluation. Roug e modifications , 2012 .

[40]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Natalie Schluter,et al.  The limits of automatic summarisation according to ROUGE , 2017, EACL.

[42]  Laura Cristina Lanzarini,et al.  Keyword extracting using auto-associative neural networks , 2014 .

[43]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[44]  C. Lee Giles,et al.  SEERLAB: A System for Extracting Keyphrases from Scholarly Documents , 2010, SemEval@ACL.

[45]  Laurent Romary,et al.  GRISP: A Massive Multilingual Terminological Database for Scientific and Technical Domains , 2010, LREC.

[46]  Cornelia Caragea,et al.  Keyphrase Extraction from Disaster-related Tweets , 2019, WWW.

[47]  Chang Choi,et al.  An improved method of automatic text summarization for web contents using lexical chain with semantic-related terms , 2018, Soft Comput..

[48]  S. Chitrakala,et al.  A survey on extractive text summarization , 2017, 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP).

[49]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[50]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[51]  Timothy Baldwin,et al.  Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction , 2012, COLING.

[52]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[53]  Aditi Sharan,et al.  Keyword and Keyphrase Extraction Techniques: A Literature Review , 2015 .

[54]  Patrice Lopez,et al.  GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications , 2009, ECDL.

[55]  Hyoil Han,et al.  BioChain: lexical chaining methods for biomedical text summarization , 2006, SAC.

[56]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[57]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[58]  Brahim Ouhbi,et al.  Automatic keyphrase extraction: An overview of the state of the art , 2016, 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt).

[59]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[60]  Yuen-Hsien Tseng Multilingual keyword extraction for term suggestion , 1998, SIGIR '98.

[61]  Haitao Huang,et al.  Abstractive text summarization using LSTM-CNN based deep learning , 2018, Multimedia Tools and Applications.

[62]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[63]  Kathleen F. McCoy,et al.  Efficient text summarization using lexical chains , 2000, IUI '00.

[64]  Leandro Nunes de Castro,et al.  A keyword extraction method from twitter messages represented as graphs , 2014, Appl. Math. Comput..

[65]  Seong-Bae Park,et al.  A just-in-time keyword extraction from meeting transcripts using temporal and participant information , 2015, Journal of Intelligent Information Systems.

[66]  Israel Cuevas,et al.  Automatic text summarization within big data frameworks , 2018 .

[67]  Jaime G. Carbonell,et al.  Automatic Keyword Extraction on Twitter , 2015, ACL.

[68]  Aïcha Mokhtari,et al.  Accurate Keyphrase Extraction from Scientific Papers by Mining Linguistic Information , 2015, CLBib@ISSI.

[69]  Z. Li,et al.  How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds , 2017, Expert Syst. Appl..

[70]  Eunji Lee,et al.  SwiftRank: An Unsupervised Statistical Approach of Keyword and Salient Sentence Extraction for Individual Documents , 2017, EUSPN/ICTH.

[71]  Shibamouli Lahiri,et al.  Keyword extraction from emails* , 2016, Natural Language Engineering.

[72]  Cornelia Caragea,et al.  Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach , 2014, EMNLP.

[73]  Minh-Thang Luong,et al.  WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure , 2010, *SEMEVAL.

[74]  Hsin-Min Wang,et al.  MATBN: A Mandarin Chinese Broadcast News Corpus , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[75]  Jee-Hyong Lee,et al.  Keyword extraction for blogs based on content richness , 2014, J. Inf. Sci..

[76]  Simon J. Puglisi,et al.  Lempel-Ziv Compression , 2016, Encyclopedia of Algorithms.

[77]  Cornelia Caragea,et al.  Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents , 2019, WWW.

[78]  Xu Sun,et al.  A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification , 2018, IJCAI.

[79]  Mbembo Loundou Varus,et al.  An Extractive Multi-document Summarization Technique Based on Fuzzy Logic Approach , 2016, 2016 International Conference on Network and Information Systems for Computers (ICNISC).

[80]  Naresh Kumar Garg,et al.  Text Summarization of Hindi Documents Using Rule Based Approach , 2016, 2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE).

[81]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[82]  Wei Xu,et al.  Task-oriented keyphrase extraction from social media , 2018, Multimedia Tools and Applications.

[83]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[84]  Xindong Wu,et al.  Efficient sequential pattern mining with wildcards for keyphrase extraction , 2017, Knowl. Based Syst..

[85]  Dejun Mu,et al.  Word-sentence co-ranking for automatic extractive text summarization , 2017, Expert Syst. Appl..

[86]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[87]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[88]  Devdatt P. Dubhashi,et al.  Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[89]  Xuanjing Huang,et al.  Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter , 2016, EMNLP.

[90]  Robert H. Wozniak Classics in Psychology, 1855-1914: Historical Essays , 1998 .

[91]  Michalis Vazirgiannis,et al.  A Graph Degeneracy-based Approach to Keyword Extraction , 2016, EMNLP.

[92]  Clare R. Voss,et al.  Scalable Topical Phrase Mining from Text Corpora , 2014, Proc. VLDB Endow..

[93]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[94]  Sanda Martinčić-Ipšić,et al.  An Overview of Graph-Based Keyword Extraction Methods and Approaches , 2015 .

[95]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[96]  Saïd Abdeddaïm,et al.  Accurate keyphrase extraction by discriminating overlapping phrases , 2014, J. Inf. Sci..

[97]  Kuan-Yu Chen,et al.  Extractive speech summarization leveraging convolutional neural network techniques , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[98]  Yike Guo,et al.  A visual attention-based keyword extraction for document classification , 2018, Multimedia Tools and Applications.

[99]  Hsin-Hsi Chen,et al.  Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[100]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[101]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[102]  Naomie Salim,et al.  A review on abstractive summarization methods , 2014 .

[103]  Yulia Ledeneva,et al.  Word Sequence Models for Single Text Summarization , 2009, 2009 Second International Conferences on Advances in Computer-Human Interactions.

[104]  Shuguang Han,et al.  Deep Keyphrase Generation , 2017, ACL.

[105]  Darío Álvarez Gutiérrez,et al.  Naïve Algorithms for Keyphrase Extraction and Text Summarization from a Single Document Inspired by the Protein Biosynthesis Process , 2004, BioADIT.

[106]  Shibamouli Lahiri,et al.  Building a Dataset for Summarization and Keyword Extraction from Emails , 2014, LREC.

[107]  Christos Mousas,et al.  Generative Adversarial Network with Policy Gradient for Text Summarization , 2019, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[108]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[109]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[110]  Ferda Nur Alpaslan,et al.  Text summarization using Latent Semantic Analysis , 2011, J. Inf. Sci..

[111]  Li Wang,et al.  A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization , 2018, IJCAI.

[112]  Tan Qingping,et al.  A Graph-based Approach of Automatic Keyphrase Extraction , 2017 .

[113]  Korra Sathya Babu,et al.  Automatic Keyword Extraction for Text Summarization in e-Newspapers , 2016, ICIA.

[114]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[115]  Jing-Song Hu,et al.  Automatic Keyphrases Extraction from Document Using Neural Network , 2005, ICMLC.

[116]  Mita Nasipuri,et al.  Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks , 2012, J. Inf. Process. Syst..