Stability and Generalization of Learning Algorithms that Converge to Global Optima
暂无分享,去创建一个
[1] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[2] Lorenzo Rosasco,et al. Generalization Properties and Implicit Regularization for Multiple Passes SGM , 2016, ICML.
[3] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..
[4] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[5] Dacheng Tao,et al. Algorithmic Stability and Hypothesis Complexity , 2017, ICML.
[6] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[7] Max Tegmark,et al. Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.
[8] Mihai Anitescu,et al. Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..
[9] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[10] Massimiliano Pontil,et al. Stability of Randomized Learning Algorithms , 2005, J. Mach. Learn. Res..
[11] Christoph H. Lampert,et al. Data-Dependent Stability of Stochastic Gradient Descent , 2017, ICML.
[12] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[13] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[14] Shai Shalev-Shwartz,et al. Fast Rates for Empirical Risk Minimization of Strict Saddle Problems , 2017, COLT.
[15] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[16] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[17] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[18] Jiashi Feng,et al. The Landscape of Deep Learning Algorithms , 2017, ArXiv.
[19] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[20] Luc Devroye,et al. Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.
[21] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[22] Alexander D. Ioffe,et al. On Sensitivity Analysis of Nonlinear Programs in Banach Spaces: The Approach via Composite Unconstrained Optimization , 1994, SIAM J. Optim..
[23] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[24] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[25] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[26] J. Frédéric Bonnans,et al. Second-order Sufficiency and Quadratic Growth for Nonisolated Minima , 1995, Math. Oper. Res..
[27] Cynthia Dwork,et al. Differential Privacy , 2006, ICALP.
[28] Kobbi Nissim,et al. On the Generalization Properties of Differential Privacy , 2015, ArXiv.
[29] Toniann Pitassi,et al. Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.
[30] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..