论文信息 - Under review as a conference paper at ICLR 2016

Under review as a conference paper at ICLR 2016

We describe a method for learning word embeddings with stochastic dimensionality. Our Infinite Skip-Gram (iSG) model specifies an energy-based joint distribution over a word vector, a context vector, and their dimensionality. By employing the same techniques used to make the Infinite Restricted Boltzmann Machine (Cote & Larochelle, 2015) tractable, we define vector dimensionality over a countably infinite domain, allowing vectors to grow as needed during training. After training, we find that the distribution over embedding dimensionality for a given word is highly interpretable and leads to an elegant probabilistic mechanism for word sense induction. We show qualitatively and quantitatively that the iSG produces parameter-efficient representations that are robust to language's inherent ambiguity.

Eric T. Nalisnick | Sachin Ravi

[1] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[2] Andrew McCallum,et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[3] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[4] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[5] Enhong Chen,et al. A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[6] Andrew McCallum,et al. Word Representations via Gaussian Embedding , 2014, ICLR.

[7] Hugo Larochelle,et al. An Infinite Restricted Boltzmann Machine , 2015, Neural Computation.

[8] Georgiana Dinu,et al. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[9] Raymond J. Mooney,et al. Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[10] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[11] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.