Unsupervised Document Embedding With CNNs

We propose a new model for unsupervised document embedding. Leading existing approaches either require complex inference or use recurrent neural networks (RNN) that are difficult to parallelize. We take a different route and develop a convolutional neural network (CNN) embedding model. Our CNN architecture is fully parallelizable resulting in over 10x speedup in inference time over RNN models. Parallelizable architecture enables to train deeper models where each successive layer has increasingly larger receptive field and models longer range semantic structure within the document. We additionally propose a fully unsupervised learning algorithm to train this model based on stochastic forward prediction. Empirical results on two public benchmarks show that our approach produces comparable to state-of-the-art accuracy at a fraction of computational cost.

[1]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[3]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[4]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[5]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[6]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[7]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Minmin Chen,et al.  Efficient Vector Representation for Documents through Corruption , 2017, ICLR.

[11]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[12]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .