Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations

Distributional hypothesis lies in the root of most existing word representation models by inferring word meaning from its external contexts. However, distributional models cannot handle rare and morphologically complex words very well and fail to identify some fine-grained linguistic regularity as they are ignoring the word forms. On the contrary, morphology points out that words are built from some basic units, i.e., morphemes. Therefore, the meaning and function of such rare words can be inferred from the words sharing the same morphemes, and many syntactic relations can be directly identified based on the word forms. However, the limitation of morphology is that it cannot infer the relationship between two words that do not share any morphemes. Considering the advantages and limitations of both approaches, we propose two novel models to build better word representations by modeling both external contexts and internal morphemes in a jointly predictive way, called BEING and SEING. These two models can also be extended to learn phrase representations according to the distributed morphology theory. We evaluate the proposed models on similarity tasks and analogy tasks. The results demonstrate that the proposed models can outperform state-of-the-art models significantly on both word and phrase representation learning.

[1]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[2]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[3]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[4]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[8]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[9]  Tie-Yan Liu,et al.  Co-learning of Word Representations and Morpheme Representations , 2014, COLING.

[10]  Joan L. Bybee Morphology: A study of the relation between meaning and form , 1985 .

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Sanjeev Arora,et al.  Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings , 2015, ArXiv.

[13]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[14]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[15]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[16]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[17]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[18]  Zhiyuan Liu,et al.  Joint Learning of Character and Word Embeddings , 2015, IJCAI.

[19]  C. E. Bazell Studies in linguistic analysis. Special volume of the Philological Society , vii, 205 pp., 5 plates. Oxford: Basil Blackwell, 1957. 70s. , 1959 .

[20]  Morris Halle,et al.  Distributed morphology and the pieces of inflection , 1993 .

[21]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[22]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[23]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[24]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[25]  Marco Marelli,et al.  Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics , 2013, ACL.

[26]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[27]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[28]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[31]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[32]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[33]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.