A Context-based Framework for Modeling the Role and Function of On-line Resource Citations in Scientific Literature

We introduce a new task of modeling the role and function for on-line resource citations in scientific literature. By categorizing the on-line resources and analyzing the purpose of resource citations in scientific texts, it can greatly help resource search and recommendation systems to better understand and manage the scientific resources. For this novel task, we are the first to create an annotation scheme, which models the different granularity of information from a hierarchical perspective. And we construct a dataset SciRes, which includes 3,088 manually annotated resource contexts. In this paper, we propose a possible solution by using a multi-task framework to build the scientific resource classifier (SciResCLF) for jointly recognizing the role and function types. Then we use the classification results to help a scientific resource recommendation (SciResREC) task. Experiments show that our model achieves the best results on both the classification task and the recommendation task. The SciRes dataset is released for future research.

[1]  Wenyi Huang,et al.  A Neural Probabilistic Model for Context Based Citation Recommendation , 2015, AAAI.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Chong Feng,et al.  A Context-based Framework for Resource Citation Classification in Scientific Literatures , 2019, SIGIR.

[4]  Isabelle Augenstein,et al.  SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications , 2017, *SEMEVAL.

[5]  Jie Tang,et al.  A Discriminative Approach to Topic-Based Citation Recommendation , 2009, PAKDD.

[6]  Kyle Lo,et al.  SciBERT: Pretrained Contextualized Embeddings for Scientific Text , 2019, ArXiv.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Luyao Huang,et al.  Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence , 2019, NAACL.

[9]  Mari Ostendorf,et al.  Scientific Information Extraction with Semi-supervised Neural Tagging , 2017, EMNLP.

[10]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[11]  Jian Pei,et al.  Citation recommendation without author supervision , 2011, WSDM '11.

[12]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[13]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[14]  Robert Stevens,et al.  A Survey of Bioinformatics Database and Software Usage through Mining the Literature , 2016, PloS one.

[15]  Wenyi Huang,et al.  Recommending citations: translating papers into references , 2012, CIKM.

[16]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[17]  Philip S. Yu,et al.  BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.

[18]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[19]  Yasunori Yamamoto,et al.  OReFiL: an online resource finder for life sciences , 2007, BMC Bioinformatics.

[20]  Jimmy J. Lin,et al.  DocBERT: BERT for Document Classification , 2019, ArXiv.

[21]  Robert Stevens,et al.  bioNerDS: exploring bioinformatics’ database and software use through literature mining , 2013, BMC Bioinformatics.

[22]  Miguel García-Remesal,et al.  BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature , 2009, BMC Bioinformatics.

[23]  Robert Stevens,et al.  Extracting patterns of database and software usage from the bioinformatics literature , 2014, Bioinform..

[24]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[25]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[26]  Daniel Jurafsky,et al.  Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.

[27]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.