暂无分享,去创建一个
Jian Jiao | Ruofei Zhang | Wenhao Lu | Ruofei Zhang | Jian Jiao | Wenhao Lu
[1] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[2] Christopher Joseph Pal,et al. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.
[3] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[4] Dong Yu,et al. Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features , 2016, KDD.
[5] Felix Hill,et al. Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.
[6] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[7] Shipeng Li,et al. Query-driven iterated neighborhood graph search for large scale indexing , 2012, ACM Multimedia.
[8] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[9] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[10] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[11] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[12] Hassan Ghasemzadeh,et al. Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.
[13] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[14] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.
[15] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[16] Daxin Jiang,et al. Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System , 2019, ArXiv.
[17] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[18] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[19] Xiaodong Liu,et al. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.
[20] Seyed Iman Mirzadeh,et al. Improved Knowledge Distillation via Teacher Assistant , 2020, AAAI.
[21] Rich Caruana,et al. Model compression , 2006, KDD '06.
[22] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[23] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[26] Nan Hua,et al. Universal Sentence Encoder , 2018, ArXiv.
[27] Yelong Shen,et al. Learning semantic representations using convolutional neural networks for web search , 2014, WWW.
[28] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[29] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.
[30] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.
[31] Honglak Lee,et al. An efficient framework for learning sentence representations , 2018, ICLR.
[32] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[33] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.
[36] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[37] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[38] Yuan Ni,et al. PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation , 2019, BioNLP@ACL.
[39] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[40] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.