Deep Semantic Role Labeling with Self-Attention

Semantic Role Labeling (SRL) is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end SRL with recurrent neural networks (RNN) has gained increasing attention. However, it remains a major challenge for RNNs to handle structural information and long range dependencies. In this paper, we present a simple and effective architecture for SRL which aims to address these problems. Our model is based on self-attention which can directly capture the relationships between two tokens regardless of their distance. Our single model achieves F$_1=83.4$ on the CoNLL-2005 shared task dataset and F$_1=82.7$ on the CoNLL-2012 shared task dataset, which outperforms the previous state-of-the-art results by $1.8$ and $1.0$ F$_1$ score respectively. Besides, our model is computationally efficient, and the parsing speed is 50K tokens per second on a single Titan X GPU.

[1]  Sanda M. Harabagiu,et al.  Open Domain Information Extraction via Automatic Semantic Labeling , 2003, FLAIRS Conference.

[2]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[3]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[4]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Mitchell P. Marcus,et al.  Adding Semantic Annotation to the Penn TreeBank , 1998 .

[7]  Christopher D. Manning,et al.  A Global Joint Model for Semantic Role Labeling , 2008, CL.

[8]  Pascale Fung,et al.  Semantic Roles for SMT: A Hybrid Two-Pass Model , 2009, NAACL.

[9]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Guy Lapalme,et al.  Framework for Abstractive Summarization using Text-to-Text Generation , 2011, Monolingual@ACL.

[12]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[13]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[14]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[15]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[16]  Kuzman Ganchev,et al.  Efficient Inference and Structured Learning for Semantic Role Labeling , 2015, TACL.

[17]  Roberto Basili,et al.  Textual Inference and Meaning Representation in Human Robot Interaction , 2013, JSSP.

[18]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[19]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[20]  Kuzman Ganchev,et al.  Semantic Role Labeling with Neural Network Factors , 2015, EMNLP.

[21]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[22]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.

[23]  Dan Roth,et al.  Generalized Inference with Multiple Semantic Role Labeling Systems , 2005, CoNLL.

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[27]  Luke S. Zettlemoyer,et al.  Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.

[28]  Gholamreza Haffari,et al.  Transductive learning for statistical machine translation , 2007, ACL.

[29]  Diego Marcheggiani,et al.  A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling , 2017, CoNLL.

[30]  Cícero Nogueira dos Santos,et al.  Semantic Role Labeling , 2012 .

[31]  Wei Xu,et al.  End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[32]  Daniel Jurafsky,et al.  Semantic Role Chunking Combining Complementary Syntactic Views , 2005, CoNLL.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[35]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[36]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[37]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[38]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[39]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[40]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[41]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).