Position-aware Self-attention with Relative Positional Encodings for Slot Filling

This paper describes how to apply self-attention with relative positional encodings to the task of relation extraction. We propose to use the self-attention encoder layer together with an additional position-aware attention layer that takes into account positions of the query and the object in the sentence. The self-attention encoder also uses a custom implementation of relative positional encodings which allow each word in the sentence to take into account its left and right context. The evaluation of the model is done on the TACRED dataset. The proposed model relies only on attention (no recurrent or convolutional layers are used), while improving performance w.r.t. the previous state of the art.

[1]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[2]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[3]  Christopher D. Manning,et al.  Combining Distant and Partial Supervision for Relation Extraction , 2014, EMNLP.

[4]  Hinrich Schütze,et al.  Neural Architectures for Open-Type Relation Argument Extraction , 2019, Nat. Lang. Eng..

[5]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Heike Adel,et al.  Comparing Convolutional Neural Networks to Traditional Models for Slot Filling , 2016, NAACL.

[8]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[10]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[11]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[12]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[17]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[18]  Zhi Jin,et al.  Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths , 2015, EMNLP.

[19]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.