Deep Learning Models For Multiword Expression Identification

Multiword expressions (MWEs) are lexical items that can be decomposed into multiple component words, but have properties that are unpredictable with respect to their component words. In this paper we propose the first deep learning models for token-level identification of MWEs. Specifically, we consider a layered feedforward network, a recurrent neural network, and convolutional neural networks. In experimental results we show that convolutional neural networks are able to outperform the previous state-of-the-art for MWE identification, with a convolutional neural network with three hidden layers giving the best performance.

[1]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[2]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[5]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[6]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[7]  Timothy Baldwin,et al.  Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction , 2012, COLING.

[8]  Dirk Hovy,et al.  User Review Sites as a Resource for Large-Scale Sociolinguistic Studies , 2015, WWW.

[9]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[10]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[11]  Caroline Sporleder,et al.  Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection , 2010, ACL.

[12]  Kevin Duh,et al.  The NAIST-NTT TED talk treebank , 2014, IWSLT.

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  Timothy Baldwin,et al.  Combining resources for MWE-token classification , 2012, *SEM@NAACL-HLT.

[15]  Timothy Baldwin,et al.  Disambiguating Japanese compound verbs , 2005, Comput. Speech Lang..

[16]  Nitin Indurkhya,et al.  Handbook of Natural Language Processing , 2010 .

[17]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[18]  John D. Kelleher,et al.  Idiom Token Classification using Sentential Distributed Semantics , 2016, ACL.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[21]  Timothy Baldwin,et al.  Semi-Automated Resolution of Inconsistency for a Harmonized Multiword Expression and Dependency Parse Annotation , 2017, MWE@EACL.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[24]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[25]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  Graeme Hirst,et al.  Unsupervised Multiword Segmentation of Large Corpora using Prediction-Driven Decomposition of n-grams , 2014, COLING.

[28]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.

[29]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[34]  Yannick Versley,et al.  ICL-HD at SemEval-2016 Task 10: Improving the Detection of Minimal Semantic Units and their Meanings with an Ontology and Word Embeddings , 2016, SemEval@NAACL-HLT.

[35]  Noah A. Smith,et al.  Comprehensive Annotation of Multiword Expressions in a Social Web Corpus , 2014, LREC.

[36]  Noah A. Smith,et al.  Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut , 2014, TACL.

[37]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38]  Noah A. Smith,et al.  A Corpus and Model Integrating Multiword Expressions and Supersenses , 2015, NAACL.

[39]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[40]  Gábor Berend,et al.  Opinion Expression Mining by Exploiting Keyphrase Extraction , 2011, IJCNLP.

[41]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.

[42]  Dirk Hovy,et al.  More or less supervised supersense tagging of Twitter , 2014, *SEMEVAL.

[43]  Timothy Baldwin,et al.  Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks , 2015, CoNLL.

[44]  Nathan Schneider,et al.  SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM) , 2016, *SEMEVAL.

[45]  Joakim Nivre,et al.  A Transition-Based System for Joint Lexical and Syntactic Analysis , 2016, ACL.