Improving Neural Topic Models Using Knowledge Distillation

Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate architectures, obtaining state-of-the-art topic coherence. We show that our adaptable framework not only improves performance in the aggregate over all estimated topics, as is commonly reported, but also in head-to-head comparisons of aligned topics.

[1]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[2]  Deyu Zhou,et al.  Neural Topic Modeling with Bidirectional Adversarial Training , 2020, ACL.

[3]  Hugo Gonçalo Oliveira,et al.  Can Topic Modelling benefit from Word Sense Information? , 2016, LREC.

[4]  Stan Matwin,et al.  Improving the Interpretability of Deep Neural Networks with Knowledge Distillation , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[8]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[9]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[10]  Tiejun Zhao,et al.  Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation , 2020, ACL.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Ed H. Chi,et al.  Understanding and Improving Knowledge Distillation , 2020, ArXiv.

[13]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[16]  Wanxiang Che,et al.  TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing , 2020, ACL.

[17]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[18]  Feng Nan,et al.  Topic Modeling with Wasserstein Autoencoders , 2019, ACL.

[19]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ramesh Nallapati,et al.  Coherence-Aware Neural Topic Modeling , 2018, EMNLP.

[21]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[22]  Kai Yu,et al.  Knowledge Distillation for Sequence Model , 2018, INTERSPEECH.

[23]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[24]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[25]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[26]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[27]  Mark Stevenson,et al.  Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.

[28]  Rui Wang,et al.  ATM: Adversarial-neural Topic Model , 2018, Inf. Process. Manag..

[29]  Jimmy J. Lin,et al.  Natural Language Generation for Effective Knowledge Distillation , 2019, EMNLP.

[30]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[31]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[32]  David M. Blei,et al.  Topic Modeling in Embedding Spaces , 2019, Transactions of the Association for Computational Linguistics.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Dirk Hovy,et al.  Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence , 2020, ArXiv.

[35]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[36]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[37]  Noah A. Smith,et al.  Neural Models for Documents with Metadata , 2017, ACL.

[38]  Wei Liu,et al.  Distilled Wasserstein Learning for Word Embedding and Topic Modeling , 2018, NeurIPS.

[39]  Philip Resnik,et al.  Adapting Topic Models using Lexical Associations with Tree Priors , 2017, EMNLP.

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Viet-An Nguyen,et al.  Lexical and Hierarchical Topic Regression , 2013, NIPS.

[42]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[43]  Jimmy J. Lin,et al.  Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.

[44]  Jun'ichi Tsujii,et al.  A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings , 2016, ACL.

[45]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[46]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[47]  Charles A. Sutton,et al.  Autoencoding Variational Inference For Topic Models , 2017, ICLR.

[48]  Martin Jankowiak,et al.  Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.

[49]  Shiming Xiang,et al.  Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection , 2018, ACM Multimedia.

[50]  Sophie Burkhardt,et al.  Decoupling Sparsity and Smoothness in the Dirichlet Variational Autoencoder Topic Model , 2019, J. Mach. Learn. Res..

[51]  Aidong Zhang,et al.  A Correlated Topic Model Using Word Embeddings , 2017, IJCAI.

[52]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[53]  Qi Tian,et al.  Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.