Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training
暂无分享,去创建一个
Hangfeng He | Weijie J. Su | Qi Long | Cong Fang | Hangfeng He | Qi Long | Cong Fang
[1] Dustin G. Mixon,et al. Neural collapse with unconstrained features , 2020, Sampling Theory, Signal Processing, and Data Analysis.
[2] Qianli Liao,et al. Explicit regularization and implicit bias in deep network classifiers trained with the square loss , 2020, ArXiv.
[3] Cong Fang,et al. Mathematical Models of Overparameterized Neural Networks , 2020, Proceedings of the IEEE.
[4] Cong Fang,et al. Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks , 2020, COLT.
[5] Ohad Shamir,et al. Gradient Methods Never Overfit On Separable Data , 2020, J. Mach. Learn. Res..
[6] Kristina Lerman,et al. A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..
[7] Cong Ma,et al. A Selective Overview of Deep Learning , 2019, Statistical science : a review journal of the Institute of Mathematical Statistics.
[8] D. Tao,et al. Recent advances in deep learning theory , 2020, ArXiv.
[9] E. Weinan,et al. On the emergence of tetrahedral symmetry in the final and penultimate layers of neural network classifiers , 2020, ArXiv.
[10] David L. Donoho,et al. Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.
[11] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[12] Chong You,et al. Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction , 2020, NeurIPS.
[13] Qianli Liao,et al. Theoretical issues in deep networks , 2020, Proceedings of the National Academy of Sciences.
[14] Ce Liu,et al. Supervised Contrastive Learning , 2020, NeurIPS.
[15] Mert Pilanci,et al. Convex Duality of Deep Neural Networks , 2020, ArXiv.
[16] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[17] Weijie J. Su,et al. The Local Elasticity of Neural Networks , 2019, ICLR.
[18] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[19] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[20] Samet Oymak,et al. Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks , 2019, IEEE Journal on Selected Areas in Information Theory.
[21] Justin A. Sirignano,et al. Mean field analysis of neural networks: A central limit theorem , 2018, Stochastic Processes and their Applications.
[22] René Vidal,et al. Structured Low-Rank Matrix Factorization: Global Optimality, Algorithms, and Applications , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[23] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[24] Colin Wei,et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.
[25] Sanjeev Arora,et al. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets , 2019, NeurIPS.
[26] Taghi M. Khoshgoftaar,et al. Survey on deep learning with class imbalance , 2019, J. Big Data.
[27] Mikhail Khodak,et al. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.
[28] Qi Xie,et al. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.
[29] Yang Song,et al. Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[31] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[32] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[33] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[34] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[35] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[36] James Zou,et al. AI can be sexist and racist — it’s time to make it fair , 2018, Nature.
[37] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[38] Grant M. Rotskoff,et al. Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.
[39] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[40] Timnit Gebru,et al. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.
[41] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[42] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[43] Atsuto Maki,et al. A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.
[44] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[45] Grant M. Rotskoff,et al. Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks , 2018, NeurIPS.
[46] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[47] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[48] M. Ramaswami,et al. Data Imbalance and Classifiers : Impact and Solutions from a Big Data Perspective , 2017 .
[49] Bolei Zhou,et al. Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.
[50] Longbing Cao,et al. Training deep neural networks on imbalanced data sets , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).
[51] Chen Huang,et al. Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[53] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[55] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[56] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[57] Frederick Vu. Central Limit Theorem , 2015 .
[58] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[59] Walter Daelemans,et al. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.
[60] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[61] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[62] Jean Ponce,et al. Convex Sparse Matrix Factorizations , 2008, ArXiv.
[63] Jeff A. Bilmes,et al. Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers , 2006, HLT-NAACL 2006.
[64] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.
[65] Shuzhong Zhang,et al. On Cones of Nonnegative Quadratic Functions , 2003, Math. Oper. Res..
[66] Thomas Strohmer,et al. GRASSMANNIAN FRAMES WITH APPLICATIONS TO CODING AND COMMUNICATION , 2003, math/0301135.
[67] David Lowe,et al. The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.
[68] Masaaki Kijima,et al. THEORY AND ALGORITHMS OF , 1988 .