The Unexpected Deterministic and Universal Behavior of Large Softmax Classifiers
暂无分享,去创建一个
Romain Couillet | Cosme Louart | Mohamed Tamaazousti | Mohamed El Amine Seddik | R. Couillet | Cosme Louart | M. Tamaazousti | M. Seddik | Romain Couillet
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] Z. Landsman,et al. Stein's Lemma for elliptical random vectors , 2008 .
[3] Shengcai Liao,et al. Soft-Margin Softmax for Deep Classification , 2017, ICONIP.
[4] R. Couillet,et al. Concentration of solutions to random equations with concentration of measure hypotheses , 2020 .
[5] Daniel J. Fresen. A simplified proof of CLT for convex bodies , 2019, Proceedings of the American Mathematical Society.
[6] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.
[7] Zhenyu Liao,et al. A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Sepp Hochreiter,et al. Fréchet ChemblNet Distance: A metric for generative models for molecules , 2018, ArXiv.
[9] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.
[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Romain Couillet,et al. Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures , 2020, ICML.
[12] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[13] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[14] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[15] B. Klartag. A central limit theorem for convex sets , 2006, math/0605014.
[16] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[17] HaiYing Wang,et al. Optimal subsampling for softmax regression , 2019, Statistical Papers.
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] Gabriela Csurka,et al. Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Ankit Singh Rawat,et al. Sampled Softmax with Random Fourier Features , 2019, NeurIPS.
[21] B. Caputo,et al. DEEP NEAREST CLASS MEAN CLASSIFIERS , 2018 .
[22] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Meng Yang,et al. Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.
[24] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[25] Zhenyu Liao,et al. A Random Matrix Approach to Neural Networks , 2017, ArXiv.
[26] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[27] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] M. Ledoux. The concentration of measure phenomenon , 2001 .
[29] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[30] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[31] Lacra Pavel,et al. On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning , 2017, ArXiv.
[32] Grigoris Paouris,et al. A stability result for mean width of Lp-centroid bodies , 2007 .
[33] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.
[34] Helmut Finner,et al. A Generalization of Holder's Inequality and Some Probability Inequalities , 1992 .
[35] Yasuhiro Fujiwara,et al. Sigsoftmax: Reanalysis of the Softmax Bottleneck , 2018, NeurIPS.
[36] Marcus Rohrbach,et al. Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.
[37] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[38] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[39] Pradeep Ravikumar,et al. Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.
[40] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[41] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[42] Romain Couillet,et al. Concentration of Measure and Large Random Matrices with an application to Sample Covariance Matrices , 2018, 1805.08295.
[43] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[44] P. Bickel,et al. On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.