暂无分享,去创建一个
Michael I. Jordan | Mingsheng Long | Kaichao You | Jianmin Wang | Mingsheng Long | Jianmin Wang | Kaichao You
[1] Yann LeCun,et al. Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.
[2] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .
[3] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.
[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[5] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .
[6] Marc Alexa,et al. How do humans sketch objects? , 2012, ACM Trans. Graph..
[7] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[8] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[9] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[11] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[14] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[17] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[18] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[19] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[20] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[22] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[23] Kilian Q. Weinberger,et al. Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.
[24] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[25] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[26] Carla P. Gomes,et al. Understanding Batch Normalization , 2018, NeurIPS.
[27] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[28] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[29] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[30] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[31] Fred Zhang,et al. SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.
[32] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[33] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[34] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.
[36] Vinay Uday Prabhu,et al. Do deep neural networks learn shallow learnable examples first , 2019 .
[37] Jon Kleinberg,et al. Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.
[38] Thomas Hofmann,et al. Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.