Better Word Representations with Recursive Neural Networks for Morphology

Vector-space word representations have been very successful in recent years at improving performance across a variety of NLP tasks. However, common to most existing work, words are regarded as independent entities without any explicit relationship among morphologically related words being modeled. As a result, rare and complex words are often poorly estimated, and all unknown words are represented in a rather crude way using only one or a few vectors. This paper addresses this shortcoming by proposing a novel model that is capable of building representations for morphologically complex words from their morphemes. We combine recursive neural networks (RNNs), where each morpheme is a basic unit, with neural language models (NLMs) to consider contextual information in learning morphologicallyaware word representations. Our learned models outperform existing word representations by a good margin on word similarity tasks across many datasets, including a new dataset we introduce focused on rare words to complement existing ones in an interesting way.

[1]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[2]  Michael Gasser,et al.  A Short-Term Memory Architecture for the Learning of Morphophonemic Rules , 1990, NIPS.

[3]  V. Marchman,et al.  U-shaped learning and frequency effects in a multi-layered perception: Implications for child language acquisition , 1991, Cognition.

[4]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Michael Gasser,et al.  Acquiring Receptive Morphology: A Connectionist Model , 1994, ACL.

[7]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[8]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[9]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[10]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[11]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[12]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[13]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[14]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[15]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[16]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[17]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[18]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[19]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[20]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[21]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[22]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[23]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[24]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[25]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[26]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[27]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[29]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[30]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[31]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[32]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[33]  Ronan Collobert,et al.  Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[34]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[35]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[36]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[37]  Marco Marelli,et al.  Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics , 2013, ACL.

[38]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.