On the Local Minima of the Empirical Risk
暂无分享,去创建一个
Michael I. Jordan | Rong Ge | Chi Jin | Lydia T. Liu | Lydia T. Liu | Chi Jin | Rong Ge
[1] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[2] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.
[3] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.
[4] Hariharan Narayanan,et al. Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions , 2015, COLT.
[5] John C. Duchi,et al. Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.
[6] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[7] Jan Vondrák,et al. Information-theoretic lower bounds for convex optimization with erroneous oracles , 2015, NIPS.
[8] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.
[9] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[10] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[11] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[12] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[13] Anand D. Sarwate,et al. Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..
[14] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[15] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.
[16] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.
[17] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[18] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[19] Percy Liang,et al. Certified Defenses for Data Poisoning Attacks , 2017, NIPS.
[20] Martin J. Wainwright,et al. Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.
[21] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[22] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[23] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[24] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[25] Yuanzhi Li,et al. Algorithms and matching lower bounds for approximately-convex optimization , 2016, NIPS.
[26] Po-Ling Loh,et al. Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..
[27] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[28] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[29] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[30] Ohad Shamir,et al. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.
[31] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.