暂无分享,去创建一个
Ioannis Mitliagkas | Simon Lacoste-Julien | Aristide Baratin | Brady Neal | Sarthak Mittal | Vinayak Tantia | Matthew Scicluna | Ioannis Mitliagkas | S. Lacoste-Julien | A. Baratin | Brady Neal | Sarthak Mittal | Vinayak Tantia | Matthew Scicluna | Simon Lacoste-Julien
[1] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[4] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[5] Christoph H. Lampert,et al. Data-Dependent Stability of Stochastic Gradient Descent , 2017, ICML.
[6] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[7] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[8] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[9] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[10] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[11] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[12] R. Tibshirani,et al. Generalized additive models for medical research , 1986, Statistical methods in medical research.
[13] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[14] Tomaso A. Poggio,et al. Theory of Deep Learning IIb: Optimization Properties of SGD , 2018, ArXiv.
[15] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[16] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[17] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1995, COLT '90.
[18] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[19] Pedro M. Domingos. A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.
[20] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[21] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.
[22] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[23] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[24] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[25] Daniela M. Witten,et al. An Introduction to Statistical Learning: with Applications in R , 2013 .
[26] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[27] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[28] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[29] P. Bühlmann,et al. Boosting With the L2 Loss , 2003 .
[30] B. Yu,et al. Boosting with the L_2-Loss: Regression and Classification , 2001 .
[31] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[32] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[33] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[34] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[35] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.
[36] Brian McWilliams,et al. The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.
[37] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[38] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.
[39] Levent Sagun,et al. A jamming transition from under- to over-parametrization affects generalization in deep learning , 2018, Journal of Physics A: Mathematical and Theoretical.
[40] Ron Kohavi,et al. Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.
[41] Scott Fortmann-Roe,et al. Understanding the bias-variance tradeoff , 2012 .
[42] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[43] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[44] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .
[45] Pedro M. Domingos. A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.
[46] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[47] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[48] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[49] M. Ledoux. The concentration of measure phenomenon , 2001 .
[50] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[51] Gareth James,et al. Variance and Bias for General Loss Functions , 2003, Machine Learning.
[52] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[53] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[54] GemanStuart,et al. Neural networks and the bias/variance dilemma , 1992 .
[55] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[56] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[57] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[58] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.
[59] Weifeng Liu,et al. Adaptive and Learning Systems for Signal Processing, Communication, and Control , 2010 .
[60] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[61] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[62] Kanter,et al. Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.
[63] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[64] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.