暂无分享,去创建一个
Martin J. Wainwright | Raaz Dwivedi | Chandan Singh | Bin Yu | Bin Yu | M. Wainwright | Raaz Dwivedi | Chandan Singh
[1] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[2] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[3] Bin Yu,et al. Model Selection and the Principle of Minimum Description Length , 2001 .
[4] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[5] B. Efron. How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .
[6] Junwei Lu,et al. On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond , 2018, ArXiv.
[7] R. Tibshirani,et al. Generalized Additive Models , 1986 .
[8] B. Yu,et al. Boosting with the L_2-Loss: Regression and Classification , 2001 .
[9] Dean P. Foster,et al. The Contribution of Parameters to Stochastic Complexity , 2022 .
[10] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[11] Luís Torgo,et al. OpenML: networked science in machine learning , 2014, SKDD.
[12] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[13] Randal S. Olson,et al. PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.
[14] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.
[15] Jorma Rissanen,et al. Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.
[16] Antonia Maria Tulino,et al. Random Matrix Theory and Wireless Communications , 2004, Found. Trends Commun. Inf. Theory.
[17] Saharon Rosset,et al. When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples , 2014 .
[18] G. Schwarz. Estimating the Dimension of a Model , 1978 .
[19] Nairanjana Dasgupta,et al. Analyzing Categorical Data , 2004, Technometrics.
[20] Yi Ma,et al. Rethinking Bias-Variance Trade-off for Generalization of Neural Networks , 2020, ICML.
[21] Bo Zhang,et al. Generalized degrees of freedom and adaptive model selection in linear mixed-effects models , 2012, Comput. Stat. Data Anal..
[22] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[23] Tong Zhang. From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.
[24] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[25] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[26] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[27] Colin L. Mallows,et al. Some Comments on Cp , 2000, Technometrics.
[28] C. Mallows. More comments on C p , 1995 .
[29] J. W. Silverstein. Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices , 1995 .
[30] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[31] Z. Bai,et al. Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .
[32] R. Tibshirani,et al. Linear Smoothers and Additive Models , 1989 .
[33] A. U.S.,et al. Effective degrees of freedom : a flawed metaphor , 2015 .
[34] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[35] Mary C. Meyer,et al. ON THE DEGREES OF FREEDOM IN SHAPE-RESTRICTED REGRESSION , 2000 .
[36] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[37] Robert P. W. Duin,et al. Bagging for linear classifiers , 1998, Pattern Recognit..
[38] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .
[39] Xiaotong Shen,et al. Adaptive Model Selection , 2002 .
[40] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[41] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[42] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[43] R. Tibshirani,et al. On the “degrees of freedom” of the lasso , 2007, 0712.0881.
[44] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[45] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[46] Kuniaki Uehara,et al. Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.
[47] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[48] Yann Ollivier,et al. The Description Length of Deep Learning models , 2018, NeurIPS.
[49] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[50] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[51] Paul M. B. Vitányi,et al. An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.
[52] P. Bühlmann,et al. Boosting With the L2 Loss , 2003 .
[53] Bin Yu,et al. Three principles of data science: predictability, computability, and stability (PCS) , 2019 .
[54] Robert P. W. Duin,et al. Bagging and the Random Subspace Method for Redundant Feature Spaces , 2001, Multiple Classifier Systems.
[55] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[56] H. Akaike. A new look at the statistical model identification , 1974 .
[57] A. Shiryayev. On Tables of Random Numbers , 1993 .
[58] Jürgen Schmidhuber,et al. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.
[59] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Ryan J. Tibshirani,et al. Degrees of freedom and model search , 2014, 1402.1920.
[61] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[62] Peter Grünwald,et al. A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.
[63] J. Rissanen. Stochastic Complexity and Modeling , 1986 .
[64] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[65] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[66] R. Tibshirani,et al. Degrees of freedom in lasso problems , 2011, 1111.0653.
[67] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[68] Martin J. Wainwright,et al. Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[69] Kenji Yamanishi,et al. High-dimensional penalty selection via minimum description length principle , 2018, Machine Learning.
[70] Robert P. W. Duin,et al. Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.
[71] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[73] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[74] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.