暂无分享,去创建一个
[1] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[2] Deyu Zhou,et al. Neural Topic Modeling with Bidirectional Adversarial Training , 2020, ACL.
[3] Hugo Gonçalo Oliveira,et al. Can Topic Modelling benefit from Word Sense Information? , 2016, LREC.
[4] Stan Matwin,et al. Improving the Interpretability of Deep Neural Networks with Knowledge Distillation , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).
[5] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[6] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[7] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[8] I. Dan Melamed,et al. Models of translation equivalence among words , 2000, CL.
[9] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[10] Tiejun Zhao,et al. Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation , 2020, ACL.
[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[12] Ed H. Chi,et al. Understanding and Improving Knowledge Distillation , 2020, ArXiv.
[13] Chong Wang,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.
[16] Wanxiang Che,et al. TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing , 2020, ACL.
[17] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[18] Feng Nan,et al. Topic Modeling with Wasserstein Autoencoders , 2019, ACL.
[19] Andrew K. C. Wong,et al. Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Ramesh Nallapati,et al. Coherence-Aware Neural Topic Modeling , 2018, EMNLP.
[21] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[22] Kai Yu,et al. Knowledge Distillation for Sequence Model , 2018, INTERSPEECH.
[23] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[24] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[25] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .
[26] Phil Blunsom,et al. Neural Variational Inference for Text Processing , 2015, ICML.
[27] Mark Stevenson,et al. Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.
[28] Rui Wang,et al. ATM: Adversarial-neural Topic Model , 2018, Inf. Process. Manag..
[29] Jimmy J. Lin,et al. Natural Language Generation for Effective Knowledge Distillation , 2019, EMNLP.
[30] John D. Lafferty,et al. Correlated Topic Models , 2005, NIPS.
[31] Shakir Mohamed,et al. Implicit Reparameterization Gradients , 2018, NeurIPS.
[32] David M. Blei,et al. Topic Modeling in Embedding Spaces , 2019, Transactions of the Association for Computational Linguistics.
[33] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[34] Dirk Hovy,et al. Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence , 2020, ArXiv.
[35] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[36] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[37] Noah A. Smith,et al. Neural Models for Documents with Metadata , 2017, ACL.
[38] Wei Liu,et al. Distilled Wasserstein Learning for Word Embedding and Topic Modeling , 2018, NeurIPS.
[39] Philip Resnik,et al. Adapting Topic Models using Lexical Associations with Tree Priors , 2017, EMNLP.
[40] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[41] Viet-An Nguyen,et al. Lexical and Hierarchical Topic Regression , 2013, NIPS.
[42] Timothy Baldwin,et al. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.
[43] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[44] Jun'ichi Tsujii,et al. A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings , 2016, ACL.
[45] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[46] Andrew McCallum,et al. Rethinking LDA: Why Priors Matter , 2009, NIPS.
[47] Charles A. Sutton,et al. Autoencoding Variational Inference For Topic Models , 2017, ICLR.
[48] Martin Jankowiak,et al. Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.
[49] Shiming Xiang,et al. Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection , 2018, ACM Multimedia.
[50] Sophie Burkhardt,et al. Decoupling Sparsity and Smoothness in the Dirichlet Variational Autoencoder Topic Model , 2019, J. Mach. Learn. Res..
[51] Aidong Zhang,et al. A Correlated Topic Model Using Word Embeddings , 2017, IJCAI.
[52] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[53] Qi Tian,et al. Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.