Why Did You Not Compare With That? Identifying Papers for Use as Baselines

We propose the task of automatically identifying papers used as baselines in a scientific article. We frame the problem as a binary classification task where all the references in a paper are to be classified as either baselines or nonbaselines. This is a challenging problem due to the numerous ways in which a baseline reference can appear in a paper. We develop a dataset of 2, 075 papers from ACL anthology corpus with all their references manually annotated as one of the two classes. We develop a multi-module attention-based neural classifier for the baseline classification task that outperforms four state-of-the-art citation role classification methods when applied to the baseline classification task. We also present an analysis of the errors made by the proposed classifier, eliciting the challenges that make baseline identification a challenging problem.

[1]  Eunjeong Park,et al.  A context-aware citation recommendation model with BERT and graph convolutional networks , 2019, Scientometrics.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Dejun Mu,et al.  A LSTM Based Model for Personalized Context-Aware Citation Recommendation , 2018, IEEE Access.

[4]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[5]  Guo Zhang,et al.  Content‐based citation analysis: The next generation of citation analysis , 2014, J. Assoc. Inf. Sci. Technol..

[6]  Waleed Ammar,et al.  Structural Scaffolds for Citation Intent Classification in Scientific Publications , 2019, NAACL.

[7]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[8]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[9]  Chandra Bhagavatula,et al.  Content-Based Citation Recommendation , 2018, NAACL.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Bela Gipp,et al.  Research-paper recommender systems: a literature survey , 2015, International Journal on Digital Libraries.

[12]  Haluk Bingol,et al.  Context sensitive article ranking with citation context analysis , 2015, Scientometrics.

[13]  Xiaojun Wan,et al.  Are all literature citations equally important? Automatic citation strength estimation and its applications , 2014, J. Assoc. Inf. Sci. Technol..

[14]  Ulrich Schäfer,et al.  Ensemble-style Self-training on Citation Classification , 2011, IJCNLP.

[15]  Iñaki Ucar,et al.  Growth in the number of references in engineering journal papers during the 1972–2013 period , 2013, Scientometrics.

[16]  Cornelia Caragea,et al.  CiteSeerX: AI in a Digital Library Search Engine , 2014, AI Mag..

[17]  Jie Tang,et al.  A Discriminative Approach to Topic-Based Citation Recommendation , 2009, PAKDD.

[18]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[19]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[20]  Niloy Ganguly,et al.  FeRoSA: A Faceted Recommendation System for Scientific Articles , 2016, PAKDD.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  David Pride,et al.  An Authoritative Approach to Citation Classification , 2020, JCDL.

[23]  Suzanne Fricke,et al.  Semantic Scholar , 2018, Journal of the Medical Library Association : JMLA.

[24]  Tanmoy Chakraborty,et al.  All Fingers are not Equal: Intensity of References in Scientific Articles , 2016, EMNLP.

[25]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[26]  Prasenjit Mitra,et al.  AlgorithmSeer: A System for Extracting and Searching for Algorithms in Scholarly Big Data , 2016, IEEE Transactions on Big Data.

[27]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[28]  Xuan Su,et al.  Neural Multi-task Learning for Citation Function and Provenance , 2018, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[29]  Ying Ding,et al.  The distribution of references across texts: Some implications for citation analysis , 2013, J. Informetrics.

[30]  Stephan Vogel,et al.  Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation , 2013, ACL.

[31]  Animesh Mukherjee,et al.  The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset , 2015, CIKM.

[32]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[33]  Dragomir R. Radev,et al.  Citation Summarization Through Keyphrase Extraction , 2010, COLING.

[34]  Daniel Jurafsky,et al.  Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.

[35]  E. Garfield,et al.  Can Citation Indexing Be Automated ? , 1964 .