No . 067 April 4 , 2017 Theory of Deep Learning III : Generalization Properties of SGD by
暂无分享,去创建一个
Tomaso Poggio | Noah Golowich | Karthik Sridharan | Brando Miranda | Qianli Liao | Alexander Rakhlin | Chiyuan Zhang | T. Poggio | Chiyuan Zhang | Karthik Sridharan | A. Rakhlin | B. Miranda | Q. Liao | Noah Golowich
[1] S. Mitter,et al. Recursive stochastic algorithms for global optimization in R d , 1991 .
[2] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[3] W. Marsden. I and J , 2012 .
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Lorenzo Rosasco,et al. Optimal Learning for Multi-pass Stochastic Gradient Methods , 2016, NIPS.
[6] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[7] T. Poggio,et al. On optimal nonlinear associative recall , 1975, Biological Cybernetics.
[8] T. Poggio,et al. Memo No . 067 June 27 , 2017 Theory of Deep Learning III : Generalization Properties of SGD , 2017 .
[9] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[10] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[11] Andreas Maurer,et al. Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..
[12] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[13] W. Hackbusch,et al. On the Convergence of Alternating Least Squares Optimisation in Tensor Format Representations , 2015, 1506.00062.
[14] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[15] S. Mitter,et al. Metropolis-type annealing algorithms for global optimization in R d , 1993 .
[16] Tomaso A. Poggio,et al. Theory II: Landscape of the Empirical Risk in Deep Learning , 2017, ArXiv.
[17] B. Gidas. Global optimization via the Langevin equation , 1985, 1985 24th IEEE Conference on Decision and Control.
[18] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.
[19] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[20] P. Lennie. Receptive fields , 2003, Current Biology.
[21] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[22] Sanguthevar Rajasekaran,et al. On the Convergence Time of Simulated Annealing , 1990 .
[23] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.
[24] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..
[25] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[26] Shie Mannor,et al. Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..
[27] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[28] Mikhail Belkin,et al. Diving into the shallows: a computational perspective on large-scale shallow learning , 2017, NIPS.
[29] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[30] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[31] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.
[32] Neil Genzlinger. A. and Q , 2006 .
[33] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[34] Shie Mannor,et al. Robustness and generalization , 2010, Machine Learning.
[35] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.
[36] S. Mitter,et al. RECURSIVE STOCHASTIC ALGORITHMS FOR GLOBAL OPTIMIZATION IN , 2022 .
[37] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[38] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.
[39] Hrushikesh Narhar Mhaskar,et al. Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..
[40] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[41] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[42] Ruslan Salakhutdinov,et al. Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.
[43] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[44] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[45] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[46] Lorenzo Rosasco,et al. Learning with Incremental Iterative Regularization , 2014, NIPS.
[47] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[48] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[49] Lorenzo Rosasco,et al. Deep Convolutional Networks are Hierarchical Kernel Machines , 2015, ArXiv.
[50] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[51] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[52] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..