Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
暂无分享,去创建一个
Ming Gong | Ze Yang | Linjun Shou | Daxin Jiang | Wutao Lin | Daxin Jiang | Linjun Shou | Ming Gong | Wutao Lin | Ze Yang
[1] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[2] Philipp Cimiano,et al. Ontology-Based Interpretation of Natural Language , 2014, Ontology-Based Interpretation of Natural Language.
[3] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.
[4] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[5] Ching-Te Chiu,et al. Multi-teacher Knowledge Distillation for Compressed Video Action Recognition on Deep Neural Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.
[7] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[8] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[9] Xiaodong Liu,et al. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[11] Nitesh V. Chawla,et al. Inferring user demographics and social strategies in mobile social networks , 2014, KDD.
[12] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[13] Christoph H. Lampert,et al. Multi-task Learning with Labeled and Unlabeled Tasks , 2016, ICML.
[14] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[15] Martín Abadi,et al. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.
[16] Dacheng Tao,et al. Learning from Multiple Teacher Networks , 2017, KDD.
[17] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[18] Michael Cogswell,et al. Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks , 2015, ArXiv.
[19] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[20] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[21] Noah A. Smith,et al. Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser , 2016, EMNLP.
[22] Murhaf Fares,et al. Transfer and Multi-Task Learning for Noun-Noun Compound Interpretation , 2018, EMNLP.
[23] Jian Sun,et al. Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] James Demmel,et al. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes , 2019, ArXiv.
[25] Du-Sik Park,et al. Rotating your face using multi-task deep neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Zhi Jin,et al. Distilling Word Embeddings: An Encoding Approach , 2015, CIKM.
[27] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.
[28] Quoc V. Le,et al. BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.
[29] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[30] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[31] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.