暂无分享,去创建一个
Ethan X. Fang | Caleb Ju | Yan Li | Tuo Zhao
[1] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[2] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[5] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..
[6] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[7] Kilian Q. Weinberger,et al. Revisiting Few-sample BERT Fine-tuning , 2020, ArXiv.
[8] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[9] Shuicheng Yan,et al. Efficient Meta Learning via Minibatch Proximal Update , 2019, NeurIPS.
[10] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[11] J. Zico Kolter,et al. Overfitting in adversarially robust deep learning , 2020, ICML.
[12] Matus Telgarsky,et al. Characterizing the implicit bias via a primal-dual analysis , 2021, ALT.
[13] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[14] Matus Telgarsky,et al. Margins, Shrinkage, and Boosting , 2013, ICML.
[15] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[18] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[19] Leslie N. Smith,et al. A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.
[20] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[21] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[22] Yurii Nesterov,et al. Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..
[23] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[24] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[25] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[26] Jiashi Feng,et al. Revisit Knowledge Distillation: a Teacher-free Framework , 2019, ArXiv.
[27] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[28] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[29] S. Kakade,et al. On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .
[30] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[31] Craig M. Vineyard,et al. Distillation Strategies for Proximal Policy Optimization , 2019, ArXiv.
[32] Matus Telgarsky,et al. Gradient descent follows the regularization path for general losses , 2020, COLT.
[33] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[34] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[35] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[36] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] R. Tyrrell Rockafellar,et al. Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming , 1976, Math. Oper. Res..
[39] Jonathan Eckstein,et al. Nonlinear Proximal Point Algorithms Using Bregman Functions, with Applications to Convex Programming , 1993, Math. Oper. Res..
[40] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[41] K. Kiwiel. Proximal Minimization Methods with Generalized Bregman Functions , 1997 .
[42] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[43] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[45] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[46] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[47] Benar Fux Svaiter,et al. Error bounds for proximal point subproblems and associated inexact proximal point algorithms , 2000, Math. Program..
[48] Babak Hassibi,et al. Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization , 2018, ICLR.
[49] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[51] Alexander J. Zaslavski,et al. Convergence of a Proximal Point Method in the Presence of Computational Errors in Hilbert Spaces , 2010, SIAM J. Optim..
[52] R. Monteiro,et al. Convergence rate of inexact proximal point methods with relative error criteria for convex optimization , 2010 .
[53] Kim-Chuan Toh,et al. Bregman Proximal Point Algorithm Revisited: A New Inexact Version and its Variant , 2021 .