Reconciling modern machine-learning practice and the classical bias–variance trade-off
暂无分享,去创建一个
Mikhail Belkin | Daniel J. Hsu | Daniel Hsu | Siyuan Ma | Soumik Mandal | Mikhail Belkin | Siyuan Ma | Soumik Mandal | M. Belkin
[1] David Haussler,et al. Occam's Razor , 1987, Inf. Process. Lett..
[2] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.
[3] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[4] Vladimir Naumovich Vapni. The Nature of Statistical Learning Theory , 1995 .
[5] Manfred Opper,et al. Dynamics of Training , 1996, NIPS.
[6] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.
[7] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[8] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[9] B. Yu,et al. Boosting with the L 2-loss regression and classification , 2001 .
[10] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[11] Adele Cutler,et al. PERT – Perfect Random Tree Ensembles , 2001 .
[12] P. Bühlmann,et al. Boosting With the L2 Loss , 2003 .
[13] L. Wasserman. All of Nonparametric Statistics , 2005 .
[14] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[15] Gerd Gigerenzer,et al. Homo Heuristicus: Why Biased Minds Make Better Inferences , 2009, Top. Cogn. Sci..
[16] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[18] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.
[19] Andrew Gordon Wilson,et al. Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..
[20] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[21] David Mease,et al. Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..
[22] Lorenzo Rosasco,et al. Generalization Properties of Learning with Random Features , 2016, NIPS.
[23] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[24] V Kishore Ayyadevara,et al. Gradient Boosting Machine , 2018 .
[25] Ioannis Mitliagkas,et al. A Modern Take on the Bias-Variance Tradeoff in Neural Networks , 2018, ArXiv.
[26] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[27] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[28] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[29] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[30] Levent Sagun,et al. A jamming transition from under- to over-parametrization affects generalization in deep learning , 2018, Journal of Physics A: Mathematical and Theoretical.
[31] Catriona Dutreuilh,et al. Introduction , 2019 .
[32] Mikhail Belkin,et al. Does data interpolation contradict statistical optimality? , 2018, AISTATS.
[33] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[34] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.