Do Character-Level Neural Network Language Models Capture Knowledge of Multiword Expression Compositionality?

In this paper, we propose the first model for multiword expression (MWE) compositionality prediction based on character-level neural network language models. Experimental results on two kinds of MWEs (noun compounds and verb-particle constructions) and two languages (English and German) suggest that character-level neural network language models capture knowledge of multiword expression compositionality, in particular for English noun compounds and the particle component of English verb-particle constructions. In contrast to many other approaches to MWE compositionality prediction, this character-level approach does not require token-level identification of MWEs in a training corpus, and can potentially predict the compositionality of out-of-vocabulary MWEs.

[1]  Ming Zhou,et al.  Reranking Answers for Definitional QA Using Language Modeling , 2006, ACL.

[2]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[3]  Timothy Baldwin,et al.  How to pick out token instances of English verb-particle constructions , 2010, Lang. Resour. Evaluation.

[4]  Mark Dras,et al.  Stock Market Prediction with Deep Learning: A Character-based Neural Language Model for Event-based Trading , 2017, ALTA.

[5]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[6]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[7]  Noah A. Smith,et al.  Discriminative Lexical Semantic Segmentation with Gaps: Running the MWE Gamut , 2014, TACL.

[8]  John D. Kelleher,et al.  Idiom Token Classification using Sentential Distributed Semantics , 2016, ACL.

[9]  Virendrakumar C. Bhavsar,et al.  Deep Learning Models For Multiword Expression Identification , 2017, *SEM.

[10]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[11]  J. Milton,et al.  Language Independent Authorship Attribution using Character Level Language Models , 2003 .

[12]  Brian Roark,et al.  Discriminative Syntactic Language Modeling for Speech Recognition , 2005, ACL.

[13]  F. Rudzicz Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2010 .

[14]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[15]  Timothy Baldwin,et al.  A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions , 2015, NAACL.

[16]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[17]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[18]  Wei Lu,et al.  Learning to Capitalize with Character-Level Recurrent Neural Networks: An Empirical Study , 2016, EMNLP.

[19]  Timothy Baldwin,et al.  Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality , 2014, EACL.

[20]  Daniel Jurafsky,et al.  Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? , 2001, EMNLP.

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Stefan Müller,et al.  Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds , 2013, *SEMEVAL.

[23]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.

[24]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[25]  Ioannis Korkontzelos,et al.  Can Recognising Multiword Expressions Improve Shallow Parsing? , 2010, HLT-NAACL.

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  Colin Bannard,et al.  Acquiring phrasal lexicons from corpora , 2006 .

[28]  Suresh Manandhar,et al.  An Empirical Study on Compositionality in Compound Nouns , 2011, IJCNLP.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.