Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data
暂无分享,去创建一个
[1] J. Senez. I. INTRODUCTION , 1962, Bacteriological reviews.
[2] R. Gaylord. unpublished results , 1985 .
[3] Tang,et al. Self-Organized Criticality: An Explanation of 1/f Noise , 2011 .
[4] Thomas Anderson,et al. State of the Art of Natural Language Processing , 1987 .
[5] L.K.J. Vandamme,et al. An explanation of 1/f noise in LDD MOSFETs from the ohmic region to saturation , 1993 .
[6] D. Sornette,et al. Convergent Multiplicative Processes Repelled from Zero: Power Laws and Truncated Power Laws , 1996, cond-mat/9609074.
[7] D. Sornette. Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforganization and Disorder: Concepts and Tools , 2000 .
[8] H. Nishimori. Statistical Physics of Spin Glasses and Information Processing , 2001 .
[9] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[10] 西森 秀稔. Statistical physics of spin glasses and information processing : an introduction , 2001 .
[11] M. Newman. Power laws, Pareto distributions and Zipf's law , 2005 .
[12] M. E. J. Newman,et al. Power laws, Pareto distributions and Zipf's law , 2005 .
[13] SHORT REVIEW , 2007 .
[14] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..
[15] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[16] Woodrow L. Shew,et al. Information Capacity and Transmission Are Maximized in Balanced Cortical Networks with Neuronal Avalanches , 2010, The Journal of Neuroscience.
[17] J. Bouchaud,et al. Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management , 2011 .
[18] D. Plenz,et al. powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions , 2013, PloS one.
[19] Andreas Klaus,et al. Scale-Invariant Neuronal Avalanche Dynamics and the Cut-Off in Size Distributions , 2014, PloS one.
[20] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[21] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).
[22] Gunnar Pruessner,et al. 25 Years of Self-organized Criticality: Concepts and Controversies , 2015, 1504.04991.
[23] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[24] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[25] Jean-Philippe Bouchaud,et al. Cleaning large correlation matrices: tools from random matrix theory , 2016, 1610.08104.
[26] Michael W. Mahoney,et al. Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior , 2017, ArXiv.
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.
[29] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.
[30] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[31] Wojciech Czarnecki,et al. On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.
[32] Tomaso A. Poggio,et al. A Surprising Linear Relationship Predicts Test Performance in Deep Networks , 2018, ArXiv.
[33] J. Bouchaud,et al. Financial Applications of Random Matrix Theory: a short review , 2009, 0910.1205.
[34] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[35] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[36] Michael W. Mahoney,et al. Traditional and Heavy-Tailed Self Regularization in Neural Network Models , 2019, ICML.
[37] Michael W. Mahoney,et al. Statistical Mechanics Methods for Discovering Knowledge from Modern Production Quality Neural Networks , 2019, KDD.
[38] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[39] Philip M. Long,et al. The Singular Values of Convolutional Layers , 2018, ICLR.
[40] Michael W. Mahoney,et al. Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks , 2019, SDM.
[41] Timo Ropinski,et al. Classifying the classifier: dissecting the weight space of neural networks , 2020, ECAI.
[42] Surya Ganguli,et al. Statistical Mechanics of Deep Learning , 2020, Annual Review of Condensed Matter Physics.
[43] Daniel Keysers,et al. Predicting Neural Network Accuracy from Weights , 2020, ArXiv.
[44] Michael W. Mahoney,et al. Multiplicative noise and heavy tails in stochastic optimization , 2020, ICML.
[45] Michael W. Mahoney,et al. Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..