Self-Discriminative Learning for Unsupervised Document Embedding

Unsupervised document representation learning is an important task providing pre-trained features for NLP applications. Unlike most previous work which learn the embedding based on self-prediction of the surface of text, we explicitly exploit the inter-document information and directly model the relations of documents in embedding space with a discriminative network and a novel objective. Extensive experiments on both small and large public datasets show the competitiveness of the proposed method. In evaluations on standard document classification, our model has errors that are 5 to 13% lower than state-of-the-art unsupervised embedding models. The reduction in error is even more pronounced in scarce label setting.

[1]  Stephen Clark,et al.  An Exploration of Discourse-Based Sentence Spaces for Compositional Distributional Semantics , 2015, LSDSem@EMNLP.

[2]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[3]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[4]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[7]  Christopher D. Manning,et al.  Learning Distributed Representations for Multilingual Text Sequences , 2015, VS@HLT-NAACL.

[8]  Yoshua Bengio,et al.  Learning to Understand Phrases by Embedding the Dictionary , 2015, TACL.

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[12]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[13]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[14]  Hailin Jin,et al.  Rethinking Skip-thought: A Neighborhood based Approach , 2017, Rep4NLP@ACL.

[15]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[16]  Ashish Vaswani,et al.  Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies , 2016, HLT-NAACL.

[17]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[18]  Mikhail Khodak,et al.  A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors , 2018, ACL.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[21]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[22]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[23]  Yoshua Bengio,et al.  Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.

[24]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[25]  Minmin Chen,et al.  Efficient Vector Representation for Documents through Corruption , 2017, ICLR.

[26]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[27]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[28]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[29]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[30]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[31]  Zhe Gan,et al.  Learning Generic Sentence Representations Using Convolutional Neural Networks , 2016, EMNLP.

[32]  Hailin Jin,et al.  Trimming and Improving Skip-thought Vectors , 2017, ArXiv.

[33]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[34]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[35]  Kevin Gimpel,et al.  Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings , 2017, ACL.

[36]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.