Revisiting generalization for deep learning: PAC-Bayes, flat minima, and generative models
暂无分享,去创建一个
[1] Gintare Karolina Dziugaite,et al. Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy , 2017, NeurIPS.
[2] Eduardo D. Sontag,et al. Neural Networks with Quadratic VC Dimension , 1995, J. Comput. Syst. Sci..
[3] Toniann Pitassi,et al. Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.
[4] Peter Grünwald,et al. A tutorial introduction to the minimum description length principle , 2004, ArXiv.
[5] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .
[6] Shai Shalev-Shwartz,et al. On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.
[7] Andreas Maurer,et al. A Note on the PAC Bayesian Theorem , 2004, ArXiv.
[8] Daniel Kifer,et al. Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.
[9] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[10] Pier Giovanni Bissiri,et al. A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.
[11] David A. McAllester. A PAC-Bayesian Tutorial with A Dropout Bound , 2013, ArXiv.
[12] O. Kallenberg. Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.
[13] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.
[14] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[15] Christos Dimitrakakis,et al. Robust and Private Bayesian Inference , 2013, ALT.
[16] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[17] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[18] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[19] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[20] Cynthia Dwork,et al. Differential Privacy , 2006, ICALP.
[21] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[22] Arthur Gretton,et al. Demystifying MMD GANs , 2018, ICLR.
[23] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.
[24] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.
[25] M. Tanner,et al. Gibbs posterior for variable selection in high-dimensional classification and data mining , 2008, 0810.5655.
[26] Andrew Blake,et al. Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.
[27] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[28] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[29] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[30] Rebecca N. Wright,et al. Differential privacy: an exploration of the privacy-utility landscape , 2013 .
[31] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[32] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[33] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[34] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .
[35] Shiliang Sun,et al. PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..
[36] Peter Grünwald,et al. The Safe Bayesian - Learning the Learning Rate via the Mixability Gap , 2012, ALT.
[37] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[38] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[39] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.
[40] Paul W. Goldberg,et al. Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.
[41] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[42] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[43] Zoubin Ghahramani,et al. Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.
[44] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[45] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.
[46] John Shawe-Taylor,et al. Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..
[47] Alexandre Lacoste,et al. PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.
[48] Christian Igel,et al. A Strongly Quasiconvex PAC-Bayesian Bound , 2016, ALT.
[49] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.
[50] Stefano Soatto,et al. Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[51] François Laviolette,et al. PAC-Bayesian Bounds based on the Rényi Divergence , 2016, AISTATS.
[52] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[53] Davide Anguita,et al. Differential privacy and generalization: Sharper bounds with applications , 2017, Pattern Recognit. Lett..
[54] Raef Bassily,et al. Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.
[55] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[56] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[57] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[58] Stefano Soatto,et al. Emergence of invariance and disentangling in deep representations , 2017 .
[59] David A. McAllester. PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.
[60] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[61] Carlo Baldassi,et al. Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. , 2015, Physical review letters.
[62] Peter Grünwald,et al. Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes , 2016, J. Mach. Learn. Res..
[63] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..
[64] Eric Saund,et al. Dimensionality-Reduction Using Connectionist Networks , 1989, IEEE Trans. Pattern Anal. Mach. Intell..
[65] Bernhard Schölkopf,et al. Injective Hilbert Space Embeddings of Probability Measures , 2008, COLT.
[66] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[67] Shahar Mendelson,et al. A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.
[68] D. Mackay,et al. Bayesian neural networks and density networks , 1995 .
[69] Yiming Yang,et al. MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.
[70] Hugo Larochelle,et al. A Deep and Tractable Density Estimator , 2013, ICML.
[71] Tong Zhang. From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.
[72] Peter Grünwald,et al. Fast Rates with Unbounded Losses , 2016, ArXiv.
[73] Toniann Pitassi,et al. Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.
[74] Hiroshi Nakagawa,et al. Differential Privacy without Sensitivity , 2016, NIPS.
[75] Peter Grünwald,et al. A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.
[76] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[77] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[78] Christian Borgs,et al. Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes , 2016, Proceedings of the National Academy of Sciences.
[79] Kenji Yamanishi. Extended Stochastic Complexity and Minimax Relative Loss Analysis , 1999, ALT.
[80] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[81] Tengyu Ma,et al. On the Ability of Neural Nets to Express Distributions , 2017, COLT.
[82] John Shawe-Taylor,et al. A PAC analysis of a Bayesian estimator , 1997, COLT '97.
[83] O. Catoni. PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.
[84] Anand D. Sarwate,et al. Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..
[85] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.
[86] Alexander J. Smola,et al. Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.
[87] Raef Bassily,et al. Algorithmic stability for adaptive data analysis , 2015, STOC.
[88] John Langford,et al. (Not) Bounding the True Error , 2001, NIPS.
[89] Kunal Talwar,et al. Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).
[90] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[91] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[92] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[93] Kenji Yamanishi,et al. A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.
[94] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.
[95] John Langford,et al. Quantitatively tight sample complexity bounds , 2002 .
[96] Cynthia Dwork,et al. Differential Privacy: A Survey of Results , 2008, TAMC.
[97] Pierre Alquier,et al. Simpler PAC-Bayesian bounds for hostile data , 2016, Machine Learning.
[98] Alexander J. Smola,et al. Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.
[99] Yee Whye Teh,et al. Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..
[100] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .
[101] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..