Document classification with spherical word vectors

Recent research shows that low-dimensional continuous representations of words (word vectors) can be successfully employed to classify documents, and document vectors derived from semantic clustering work better than those derived from simple average pooling. On the other hand, our recent study demonstrated that embedding words on a hypersphere offers better performance on tasks including semantic relatedness and bilingual translation when compared to the original approach that embeds words in an unconstrained plane space. In this paper, spherical word vectors are applied to the document classification task. The experiments show that spherical word vectors can deliver good performance when combined with semantic clustering based on vMF distributions.

[1]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[4]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[5]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[6]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[7]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[8]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[9]  Dong Wang,et al.  Document classification based on c , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[10]  Shi Zhong,et al.  Efficient online spherical k-means clustering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Xiaolong Wang,et al.  Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks , 2014, BioMed research international.

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[16]  J. Kent The Fisher‐Bingham Distribution on the Sphere , 1982 .

[17]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[18]  Dong Wang,et al.  Document classification with distributions of word vectors , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[19]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[20]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[21]  Christopher Bingham An Antipodally Symmetric Distribution on the Sphere , 1974 .