Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction

Training recurrent neural networks on long texts, in particular scholarly documents, causes problems for learning. While hierarchical attention networks (HANs) are effective in solving these problems, they still lose important information about the structure of the text. To tackle these problems, we propose the use of HANs combined with structure-tags which mark the role of sentences in the document. Adding tags to sentences, marking them as corresponding to title, abstract or main body text, yields improvements over the state-of-the-art for scholarly document quality prediction: substantial gains on average against other models and consistent improvements over HANs without structure-tags. The proposed system is applied to the task of accept/reject prediction on the PeerRead dataset and compared against a recent BiLSTM-based model and joint textual+visual model. It gains 4.7% accuracy over the best of both models on the computation and language domain and loses 2.4% against the best of both on the machine learning domain. Compared to plain HANs, accuracy increases on both domains, with 1.5% and 2% respectively. We also obtain improvements when introducing the tags for prediction of the number of citations for 88k scientific publications that we compiled from the Allen AI S2ORC dataset. For our HAN-system with structure-tags we reach 28.5% explained variance, an improvement of 1.0% over HANs without structure-tags.

[1]  Nemanja Spasojevic,et al.  Actionable and Political Text Classification using Word Embeddings and LSTM , 2016, ArXiv.

[2]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[3]  Stevan Harnad,et al.  Earlier Web Usage Statistics as Predictors of Later Citation Impact , 2005, J. Assoc. Inf. Sci. Technol..

[4]  Lawrence D. Fu,et al.  Models for Predicting and Explaining Citation Count of Biomedical Articles , 2008, AMIA.

[5]  Kyle Lo,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[6]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[7]  Z. K. Silagadze,et al.  Citations and the Zipf-Mandelbrot Law , 1999, Complex Syst..

[8]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[9]  Bhavana Dalvi,et al.  A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications , 2018, NAACL.

[10]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[11]  Timothy Baldwin,et al.  A Hybrid Model for Quality Assessment of Wikipedia Articles , 2017, ALTA.

[12]  Lizhen Xu,et al.  Modularized and Attention-Based Recurrent Convolutional Neural Network for Automatic Academic Paper Aspect Scoring , 2018, WISA.

[13]  Timothy Baldwin,et al.  A Joint Model for Multimodal Document Quality Assessment , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[18]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[23]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[24]  Ji-Rong Wen,et al.  A Neural Citation Count Prediction Model based on Peer Review Text , 2019, EMNLP.

[25]  Barbara Plank,et al.  CiteTracked: A Longitudinal Dataset of Peer Reviews and Citations , 2019, BIRNDL@SIGIR.

[26]  Sadegh Aliakbary,et al.  Predicting citation counts based on deep neural network learning techniques , 2018, J. Informetrics.

[27]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.