Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions

We introduce a new method to tag Multiword Expressions (MWEs) using a linguistically interpretable language-independent deep learning architecture. We specifically target discontinuity, an under-explored aspect that poses a significant challenge to computational treatment of MWEs. Two neural architectures are explored: Graph Convolutional Network (GCN) and multi-head self-attention. GCN leverages dependency parse information, and self-attention attends to long-range relations. We finally propose a combined model that integrates complementary information from both, through a gating mechanism. The experiments on a standard multilingual dataset for verbal MWEs show that our model outperforms the baselines not only in the case of discontinuous MWEs but also in overall F-score.

[1]  Yijia Liu,et al.  Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation , 2018, CoNLL.

[2]  Carl Vogel,et al.  CRF-Seq and CRF-DepTree at PARSEME Shared Task 2018: Detecting Verbal MWEs using Sequential and Dependency-Based Approaches , 2018, LAW-MWE-CxG@COLING.

[3]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[4]  Simon Krek,et al.  Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions , 2018, COLING 2018.

[5]  Ronan Collobert,et al.  Phrase Representations for Multiword Expressions , 2016, MWE@ACL.

[6]  Karine Megerdoomian,et al.  A Semantic Template for Light Verb Constructions , 2005 .

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[9]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Virendrakumar C. Bhavsar,et al.  Deep Learning Models For Multiword Expression Identification , 2017, *SEM.

[12]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[13]  Noah A. Smith,et al.  Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut , 2014, TACL.

[14]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[15]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[16]  Carlos Ramisch,et al.  Survey: Multiword Expression Processing: A Survey , 2017, CL.

[17]  Marie Candito,et al.  The ATILF-LLF System for Parseme Shared Task: a Transition-based Verbal Multiword Expression Tagger , 2017, MWE@EACL.

[18]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[19]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[20]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[21]  Jean-Yves Antoine,et al.  Towards a Variability Measure for Multiword Expressions , 2018, NAACL-HLT.

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[24]  Shiva Taslimipoor,et al.  SHOMA at Parseme Shared Task on Automatic Identification of VMWEs: Neural Multiword Expression Tagging with High Generalisation , 2018, ArXiv.