Hierarchical bidirectional attention-based RNN in BioCreative VI precision medicine track, document triage task

In this paper, we describe our submission to the ”Document Triage Task”, of the BioCreative VI Precision Medicine Track, in which we ranked first among ten teams. The submitted system is a Hierarchical Bidirectional AttentionBased Recurrent Neural Network (RNN). Our approach utilizes the hierarchical nature of documents, which are composed of sequences of sentences, where sentences are composed of sequences of words. We propose a reusable sequence encoder architecture, which is used as sentence and document encoder. The sequence encoder, is composed of a bidirectional RNN, equipped with an attention mechanism, which identifies and captures the most important elements (words or sentences) in a sequence. Furthermore, we argue that the title of the paper itself, usually contains important information, compared to the other sentences of the abstract. For this reason, we propose a shortcut connection, which integrates the title’s vector representation, directly to the final feature representation of the document. We leverage word embeddings, trained on PubMed, for initializing the embedding layer of our network. Moreover, our system does not rely on handcrafted features. Furthermore, we train our system end-to-end using back-propagation, with stochastic gradient descent. We make the source code available to the research community. Keywords— Document Classification; Hierarchical Recurrent Neural Network; GRU; Attention Layer

[1]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[4]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[5]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[6]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[7]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Nikos Pelekis,et al.  DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Lutz Prechelt,et al.  Early Stopping-But When? , 1996, Neural Networks: Tricks of the Trade.

[12]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[13]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[14]  Zhiyong Lu,et al.  Beyond accuracy: creating interoperable and scalable text-mining web services , 2016, Bioinform..

[15]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Yifan Peng,et al.  BioCreative VI Precision Medicine Track: creating a training corpus for mining protein-protein interactions affected by mutations , 2017, BioNLP.

[18]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Ah Chung Tsoi,et al.  Lessons in Neural Network Training: Overfitting May be Harder than Expected , 1997, AAAI/IAAI.

[22]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.