论文信息 - Generative and Discriminative Text Classification with Recurrent Neural Networks - 字舞流文

Generative and Discriminative Text Classification with Recurrent Neural Networks

We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models.

Wang Ling | Chris Dyer | Phil Blunsom | Dani Yogatama | Dani Yogatama | Chris Dyer | P. Blunsom | Wang Ling | Phil Blunsom

[1] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[2] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[3] Hermann Ney,et al. On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[5] Trevor J. Hastie,et al. Discriminative vs Informative Learning , 1997, KDD.

[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[7] Yann LeCun,et al. Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[8] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[9] Kyunghyun Cho,et al. Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers , 2016, ArXiv.

[10] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[11] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[12] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[13] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[14] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15] OctoMiao. Overcoming catastrophic forgetting in neural networks , 2016 .

[16] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[17] Michalis K. Titsias,et al. One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities , 2016, NIPS.

[18] Michalis Titsias Rc Aueb. One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities , 2016, NIPS 2016.

[19] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[20] Dale Schuurmans,et al. Combining Naive Bayes and n-Gram Language Models for Text Classification , 2003, ECIR.