Keyphrase Extraction from Scientific Articles via Extractive Summarization

Automatically extracting keyphrases from scholarly documents leads to a valuable concise representation that humans can understand and machines can process for tasks, such as information retrieval, article clustering and article classification. This paper is concerned with the parts of a scientific article that should be given as input to keyphrase extraction methods. Recent deep learning methods take titles and abstracts as input due to the increased computational complexity in processing long sequences, whereas traditional approaches can also work with full-texts. Titles and abstracts are dense in keyphrases, but often miss important aspects of the articles, while full-texts on the other hand are richer in keyphrases but much noisier. To address this trade-off, we propose the use of extractive summarization models on the full-texts of scholarly documents. Our empirical study on 3 article collections using 3 keyphrase extraction methods shows promising results.

[1]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[2]  Cornelia Caragea,et al.  Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach , 2014, EMNLP.

[3]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[4]  Grant Osborne,et al.  Feature selection methods for event detection in Twitter: a text mining approach , 2020, Social Network Analysis and Mining.

[5]  Michael R. Lyu,et al.  Title-Guided Encoding for Keyphrase Generation , 2018, AAAI.

[6]  Mansoor Fateh,et al.  A deep extraction model for an unseen keyphrase detection , 2019, Soft Computing.

[7]  Ricardo Campos,et al.  YAKE! Keyword extraction from single documents using multiple local features , 2020, Inf. Sci..

[8]  Jingcheng Du,et al.  Extracting psychiatric stressors for suicide from social media using deep learning , 2018, BMC Medical Informatics and Decision Making.

[9]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[10]  Minh-Thang Luong,et al.  WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure , 2010, *SEMEVAL.

[11]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[12]  Roger Zimmermann,et al.  Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings , 2020, ECIR.

[13]  Michalis Vazirgiannis,et al.  Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction , 2015, ECIR.

[14]  Shuguang Han,et al.  Deep Keyphrase Generation , 2017, ACL.

[15]  Raghu Machiraju,et al.  Visual Exploration of Neural Document Embedding in Information Retrieval: Semantics and Feature Selection , 2019, IEEE Transactions on Visualization and Computer Graphics.

[16]  Haitao Huang,et al.  Abstractive text summarization using LSTM-CNN based deep learning , 2018, Multimedia Tools and Applications.

[17]  Cornelia Caragea,et al.  PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents , 2017, ACL.

[18]  Xiaoming Zhang,et al.  Keyphrase Generation with Correlation Constraints , 2018, EMNLP.

[19]  Cornelia Caragea,et al.  Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers , 2019, K-CAP.

[20]  Maurizio Marchese,et al.  Large Dataset for Keyphrases Extraction , 2009 .

[21]  Roger Zimmermann,et al.  Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings , 2018, NAACL.

[22]  Jiawei Han,et al.  Weakly-Supervised Hierarchical Text Classification , 2018, AAAI.

[23]  Donghong Ji,et al.  Deep neural model with self-training for scientific keyphrase extraction , 2020, PloS one.

[24]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[25]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[26]  Cornelia Caragea,et al.  Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents , 2019, WWW.

[27]  Ahmed A. Rafea,et al.  KP-Miner: A keyphrase extraction system for English and Arabic documents , 2009, Inf. Syst..

[28]  Grigorios Tsoumakas,et al.  Local word vectors guiding keyphrase extraction , 2018, Inf. Process. Manag..

[29]  Marco Basaldella,et al.  Bidirectional LSTM Recurrent Neural Network for Keyphrase Extraction , 2018, IRCDL.

[30]  Hui Xiong,et al.  Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[31]  Liang Wang,et al.  PKU_ICL at SemEval-2017 Task 10: Keyphrase Extraction with Model Ensemble and External Knowledge , 2017, *SEMEVAL.

[32]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[33]  Wang Chen,et al.  Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards , 2019, ACL.

[34]  Ondrej Bojar,et al.  Keyphrase Generation: A Text Summarization Struggle , 2019, NAACL.

[35]  Florian Boudin,et al.  Keyphrase Generation for Scientific Document Retrieval , 2020, ACL.

[36]  Florian Boudin,et al.  TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction , 2013, IJCNLP.

[37]  Lu Wang,et al.  Semi-Supervised Learning for Neural Keyphrase Generation , 2018, EMNLP.

[38]  Yuxiang Zhang,et al.  Multi-level Memory Network with CRFs for Keyphrase Extraction , 2020, PAKDD.

[39]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[40]  Florian Boudin,et al.  Unsupervised Keyphrase Extraction with Multipartite Graphs , 2018, NAACL.

[41]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[42]  Thomas Demeester,et al.  Topical Word Importance for Fast Keyphrase Extraction , 2015, WWW.