暂无分享,去创建一个
Ankit Singh Rawat | Sashank J. Reddi | Aditya Krishna Menon | Seungyeon Kim | Sanjiv Kumar | Sanjiv Kumar | A. Menon | A. Rawat | Seungyeon Kim
[1] G. Bennett. Probability Inequalities for the Sum of Independent Random Variables , 1962 .
[2] L. J. Savage. Elicitation of Personal Probabilities and Expectations , 1971 .
[3] M. Schervish. A General Method for Comparing Probability Assessors , 1989 .
[4] Jude W. Shavlik,et al. in Advances in Neural Information Processing , 1996 .
[5] Leo Breiman,et al. BORN AGAIN TREES , 1996 .
[6] Yoram Singer,et al. Learning to Order Things , 1997, NIPS.
[7] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[8] Yoram Singer,et al. Log-Linear Models for Label Ranking , 2003, NIPS.
[9] A. Buja,et al. Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .
[10] Rich Caruana,et al. Model compression , 2006, KDD '06.
[11] Eyke Hüllermeier,et al. Multilabel classification via calibrated label ranking , 2008, Machine Learning.
[12] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[13] Cynthia Rudin,et al. The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..
[14] Patrick Gallinari,et al. Ranking with ordered weighted pairwise classification , 2009, ICML '09.
[15] Thomas Gärtner,et al. Label Ranking Algorithms: A Survey , 2010, Preference Learning.
[16] Jure Leskovec,et al. Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.
[17] Yifan Gong,et al. Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.
[18] S. V. N. Vishwanathan,et al. Ranking via Robust Binary Classification , 2014, NIPS.
[19] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[20] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[21] Prateek Jain,et al. Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.
[22] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[23] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).
[25] Meng Yang,et al. Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.
[26] Zhiyuan Tang,et al. Recurrent neural network training with dark knowledge transfer , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.
[28] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Razvan Pascanu,et al. Sobolev Training for Neural Networks , 2017, NIPS.
[30] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[32] David Lopez-Paz,et al. Patient-Driven Privacy Control through Generalized Distillation , 2016, 2017 IEEE Symposium on Privacy-Aware Computing (PAC).
[33] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[34] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[35] Kaiming He,et al. Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Ke Wang,et al. Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System , 2018, KDD.
[37] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Jian Cheng,et al. Additive Margin Softmax for Face Verification , 2018, IEEE Signal Processing Letters.
[39] Geoffrey E. Hinton,et al. Large scale distributed neural network training through online distillation , 2018, ICLR.
[40] Bernt Schiele,et al. Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[41] Ling Shao,et al. Striking the Right Balance With Uncertainty , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Bin Dong,et al. Distillation ≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network , 2019, ArXiv.
[43] Venkatesh Balasubramanian,et al. Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches , 2019, WSDM.
[44] Andreas Krause,et al. Noise Regularization for Conditional Density Estimation , 2019, ArXiv.
[45] Haipeng Luo,et al. Hypothesis Set Stability and Generalization , 2019, NeurIPS.
[46] Christoph H. Lampert,et al. Towards Understanding Knowledge Distillation , 2019, ICML.
[47] Colin Wei,et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.
[48] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[49] Sebastian Bruch,et al. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance , 2019, ICTIR.
[50] Alan L. Yuille,et al. Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students , 2018, AAAI.
[51] Konstantinos Kamnitsas,et al. Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation , 2019, MICCAI.
[52] R. Venkatesh Babu,et al. Zero-Shot Knowledge Distillation in Deep Networks , 2019, ICML.
[53] Sashank J. Reddi,et al. Stochastic Negative Mining for Learning with Large Output Spaces , 2018, AISTATS.
[54] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.
[55] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Ed H. Chi,et al. Understanding and Improving Knowledge Distillation , 2020, ArXiv.
[57] Search to Distill: Pearls Are Everywhere but Not the Eyes , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Hossein Mobahi,et al. Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.
[59] Sebastian Bruch,et al. An Alternative Cross Entropy Loss for Learning-to-Rank , 2019, WWW.