Investigating Convolutional Networks and Domain-Specific Embeddings for Semantic Classification of Citations

Citation graphs and indices underpin most bibliometric analyses. However, measures derived from citation graphs do not provide insights into qualitative aspects of scientific publications. In this work, we aim to semantically characterize citations in terms of polarity and purpose. We frame polarity and purpose detection as classification tasks and investigate the performance of convolutional networks with general and domain-specific word embeddings on these tasks. Our best performing model outperforms previously reported results on a benchmark dataset by a wide margin.

[1]  Hinrich Schütze,et al.  Improving Citation Polarity Classification with Product Reviews , 2014, ACL.

[2]  Hinrich Schütze,et al.  Towards a Generic and Flexible Citation Classifier Based on a Faceted Classification Scheme , 2012, COLING.

[3]  Petr Knoth,et al.  Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing a Research Publication's Contribution , 2014 .

[4]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[5]  Ulrich Schäfer,et al.  Ensemble-style Self-training on Citation Classification , 2011, IJCNLP.

[6]  Paolo Rosso,et al.  Convolutional Neural Networks for Authorship Attribution of Short Texts , 2017, EACL.

[7]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[8]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[9]  Zdenek Zdráhal,et al.  CORE: Three Access Levels to Underpin Open Access , 2012, D Lib Mag..

[10]  Awais Athar,et al.  Sentiment Analysis of Citations using Sentence Structure-Based Features , 2011, ACL.

[11]  Jorge E. Hirsch,et al.  An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship , 2009, Scientometrics.

[12]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Eugene Garfield,et al.  THE USE OF CITATION DATA IN WRITING THE HISTORY OF SCIENCE , 1964 .

[16]  Dragomir R. Radev,et al.  Purpose and Polarity of Citation: Towards NLP-based Bibliometrics , 2013, NAACL.

[17]  E. Garfield Citation indexes for science. A new dimension in documentation through association of ideas. 1955. , 1955, International journal of epidemiology.

[18]  Samy Bengio,et al.  The Handbook of Brain Theory and Neural Networks , 2002 .

[19]  Dragomir R. Radev,et al.  NLP-driven citation analysis for scientometrics , 2016, Natural Language Engineering.

[20]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[21]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  In-Cheol Kim,et al.  Automated classification of author's sentiments in citation using machine learning techniques: A preliminary study , 2015, 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[24]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.