暂无分享,去创建一个
[1] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[2] John Moody,et al. Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.
[3] Andrew Gordon Wilson,et al. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.
[4] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[5] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[6] Andrew Gordon Wilson,et al. A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.
[7] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[8] Wolfgang Kinzel,et al. Basins of attraction near the critical storage capacity for neural networks with constant stabilities , 1989 .
[9] Stephen F. Gull,et al. Developments in Maximum Entropy Data Analysis , 1989 .
[10] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[11] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[12] David Madras,et al. Detecting Extrapolation with Local Ensembles , 2020, ICLR.
[13] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[14] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Ryan P. Adams,et al. Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.
[16] Vardan Papyan,et al. The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size. , 2018 .
[17] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[18] Andrew Gordon Wilson,et al. Subspace Inference for Bayesian Deep Learning , 2019, UAI.
[19] W. Cleveland. Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .
[20] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[21] Andrew Gordon Wilson,et al. Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.
[22] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[23] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[24] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[25] R. Tibshirani,et al. Generalized Additive Models , 1986 .
[26] Taiji Suzuki,et al. Fast generalization error bound of deep learning from a kernel perspective , 2018, AISTATS.
[27] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[28] Noah D. Goodman,et al. Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..
[29] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[30] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .
[31] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[32] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[33] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[34] Kanter,et al. Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.
[35] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[36] Stefan Wager,et al. High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.
[37] Opper,et al. Generalization ability of perceptrons with continuous outputs. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[38] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[39] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[40] John E. Moody,et al. The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.
[41] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[42] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[43] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[44] Neeraj Pradhan,et al. Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro , 2019, ArXiv.
[45] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[46] David J. C. MacKay,et al. Bayesian Interpolation , 1992, Neural Computation.
[47] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .
[48] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[49] Micah Goldblum,et al. Understanding Generalization through Visualizations , 2019, ICBINB@NeurIPS.
[50] Yoh-ichi Mototake,et al. Semi-flat minima and saddle points by embedding neural networks to overparameterization , 2019, NeurIPS.
[51] Andrew Gelman,et al. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..
[52] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[53] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[54] Guodong Zhang,et al. Functional Variational Bayesian Neural Networks , 2019, ICLR.
[55] Radford M. Neal,et al. High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees , 2006, Feature Extraction.
[56] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[57] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[58] M. Opper,et al. On the ability of the optimal perceptron to generalise , 1990 .
[59] David J. C. MacKay,et al. Bayesian Model Comparison and Backprop Nets , 1991, NIPS.
[60] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..
[61] Stefano Soatto,et al. Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).
[62] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[63] Tong Zhang,et al. Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.