暂无分享,去创建一个
Ali Ghodsi | Ali Saheb Pasand | Pranav Sharma | Aref Jafari | Mehdi Rezagholizadeh | Puneeth Salad | A. Ghodsi | Pranav Sharma | Mehdi Rezagholizadeh | A. Jafari | Puneeth Salad
[1] Mehdi Rezagholizadeh,et al. Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax , 2021, FINDINGS.
[2] Mehdi Rezagholizadeh,et al. Towards Zero-Shot Knowledge Distillation for Natural Language Processing , 2020, EMNLP.
[3] B. Jafarpour,et al. Active Curriculum Learning , 2021, INTERNLP.
[4] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.
[5] Mehdi Rezagholizadeh,et al. MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation , 2021, ACL.
[6] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Samet Oymak,et al. Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.
[9] Mehdi Rezagholizadeh,et al. ALP-KD: Attention-Based Layer Projection for Knowledge Distillation , 2020, AAAI.
[10] Jang Hyun Cho,et al. On the Efficacy of Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] Kaisheng M. Wang,et al. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation , 2021, ArXiv.
[12] Hassan Ghasemzadeh,et al. Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.
[13] Jinwoo Shin,et al. Regularizing Class-Wise Predictions via Self-Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[15] Qun Liu,et al. Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers , 2020, EMNLP.
[16] Xiaolin Hu,et al. Knowledge Distillation via Route Constrained Optimization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Dongpeng Chen,et al. Essence Knowledge Distillation for Speech Recognition , 2019, ArXiv.
[18] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[19] Xiaolin Hu,et al. Online Knowledge Distillation via Collaborative Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Ali Ghodsi,et al. Annealing Knowledge Distillation , 2021, EACL.
[21] David D. Cox,et al. On the information bottleneck theory of deep learning , 2018, ICLR.
[22] Samet Oymak,et al. Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? , 2018, ICML.
[23] Nam Soo Kim,et al. TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] Yevgen Chebotar,et al. Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition , 2016, INTERSPEECH.
[25] Kyoung-Woon On,et al. Toward General Scene Graph: Integration of Visual Semantic Knowledge with Entity Synset Alignment , 2020, ALVR.
[26] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .