Multiword expression identification using deep learning

Multiword expressions combine words in various ways to produce phrases that have properties that are not predictable from the properties of their individual words or their normal mode of combination. There are many types of multiword expressions including proverbs, named entities, and verb noun combinations. In this thesis, we propose various deep learning models to identify multiword expressions and compare their performance to more traditional machine learning models and current multiword expression identification systems. We show that convolutional neural networks are able to perform better than state-of-the-art with the three hidden layer convolutional neural network performing best. To our knowledge, this is the first work that applies deep learning models for broad multiword expression identification.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Timothy Baldwin,et al.  Combining resources for MWE-token classification , 2012, *SEM@NAACL-HLT.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Caroline Sporleder,et al.  Linguistic Cues for Distinguishing Literal and Non-Literal Usages , 2010, COLING.

[5]  John D. Kelleher,et al.  Idiom Token Classification using Sentential Distributed Semantics , 2016, ACL.

[6]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[7]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[8]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  Jari Björne,et al.  UTU at SemEval-2016 Task 10: Binary Classification for Expression Detection (BCED) , 2016, SemEval@NAACL-HLT.

[12]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[13]  Virendrakumar C. Bhavsar,et al.  Deep Learning Models For Multiword Expression Identification , 2017, *SEM.

[14]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[15]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[16]  Stefan Evert,et al.  Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties , 2006 .

[17]  Hwee Tou Ng,et al.  Semi-Supervised Word Sense Disambiguation Using Word Embeddings in General and Specific Domains , 2015, NAACL.

[18]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[21]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[22]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[23]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[24]  Ekaterina Vylomova,et al.  VectorWeavers at SemEval-2016 Task 10: From Incremental Meaning to Semantic Unit (phrase by phrase) , 2016, SemEval@NAACL-HLT.

[25]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[26]  Timothy Baldwin,et al.  Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction , 2012, COLING.

[27]  Carlos Ramisch,et al.  UFRGS&LIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging , 2016, SemEval@NAACL-HLT.

[28]  Mona Diab,et al.  Verb noun construction MWE token supervised classification , 2009 .

[29]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[30]  Ming Zhou,et al.  Question Answering over Freebase with Multi-Column Convolutional Neural Networks , 2015, ACL.

[31]  Karel Jezek,et al.  Determining Compositionality of Word Expressions Using Word Space Models , 2013, MWE@NAACL-HLT.

[32]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[36]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[37]  Daisuke Kawahara,et al.  Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model , 2015, EMNLP.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[40]  Stefan Evert,et al.  Proceedings of the Workshop on a Broader Perspective on Multiword Expressions , 2007 .

[41]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[42]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[43]  Dirk Hovy,et al.  User Review Sites as a Resource for Large-Scale Sociolinguistic Studies , 2015, WWW.

[44]  Carlos Ramisch,et al.  mwetoolkit: a Framework for Multiword Expression Identification , 2010, LREC.

[45]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[46]  Caroline Sporleder,et al.  Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions , 2009, EACL.

[47]  Nathan Schneider,et al.  SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM) , 2016, *SEMEVAL.

[48]  Timothy Baldwin,et al.  A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions , 2015, NAACL.

[49]  Joakim Nivre,et al.  A Transition-Based System for Joint Lexical and Syntactic Analysis , 2016, ACL.

[50]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[51]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[52]  Paul Cook,et al.  A Word Embedding Approach to Identifying Verb-Noun Idiomatic Combinations , 2016, MWE@ACL.

[53]  Carlos Ramisch,et al.  An Evaluation of Methods for the Extraction of Multiword Expressions , 2008, LREC 2008.

[54]  Stephen Clark,et al.  Detecting Compositionality of Multi-Word Expressions using Nearest Neighbours in Vector Space Models , 2013, EMNLP.

[55]  Ray Jackendoff,et al.  The Architecture of the Language Faculty , 1996 .

[56]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[57]  Carlos Ramisch,et al.  Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, MWE@ACL 2011, Portland, Oregon, USA, June 23, 2011 , 2011, MWE@ACL.

[58]  Pavel Pecina AMachine Learning Approach to Multiword Expression Extraction , 2008 .

[59]  Graeme Hirst,et al.  Unsupervised Multiword Segmentation of Large Corpora using Prediction-Driven Decomposition of n-grams , 2014, COLING.

[60]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[61]  Ramón Fernández Astudillo,et al.  Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[62]  Paul Cook,et al.  UNBNLP at SemEval-2016 Task 1: Semantic Textual Similarity: A Unified Framework for Semantic Processing and Evaluation , 2016, *SEMEVAL.

[63]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[64]  Kevin Duh,et al.  The NAIST-NTT TED talk treebank , 2014, IWSLT.

[65]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[66]  Ioannis Korkontzelos,et al.  Detecting Compositionality in Multi-Word Expressions , 2009, ACL/IJCNLP.

[67]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[68]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[69]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[70]  Noah A. Smith,et al.  A Corpus and Model Integrating Multiword Expressions and Supersenses , 2015, NAACL.

[71]  Gábor Berend,et al.  Opinion Expression Mining by Exploiting Keyphrase Extraction , 2011, IJCNLP.

[72]  Dirk Hovy,et al.  More or less supervised supersense tagging of Twitter , 2014, *SEMEVAL.

[73]  Carlos Ramisch,et al.  Never-Ending Multiword Expressions Learning , 2015, MWE@NAACL-HLT.

[74]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[75]  Caroline Sporleder,et al.  Using Gaussian Mixture Models to Detect Figurative Language in Context , 2010, NAACL.

[76]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[77]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[78]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[79]  Yannick Versley,et al.  ICL-HD at SemEval-2016 Task 10: Improving the Detection of Minimal Semantic Units and their Meanings with an Ontology and Word Embeddings , 2016, SemEval@NAACL-HLT.

[80]  Noah A. Smith,et al.  Comprehensive Annotation of Multiword Expressions in a Social Web Corpus , 2014, LREC.

[81]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[82]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.