论文信息 - Deep Learning on a Data Diet: Finding Important Examples Early in Training

Deep Learning on a Data Diet: Finding Important Examples Early in Training

The recent success of deep learning has partially been driven by training increasingly overparametrized networks on ever larger datasets. It is therefore natural to ask: how much of the data is superfluous, which examples are important for generalization, and how do we find them? In this work, we make the striking observation that, on standard vision benchmarks, the initial loss gradient norm of individual training examples, averaged over several weight initializations, can be used to identify a smaller set of training data that is important for generalization. Furthermore, after only a few epochs of training, the information in gradient norms is reflected in the normed error–L2 distance between the predicted probabilities and one hot labels–which can be used to prune a significant fraction of the dataset without sacrificing test accuracy. Based on this, we propose data pruning methods which use only local information early in training, and connect them to recent work that prunes data by discarding examples that are rarely forgotten over the course of training. Our methods also shed light on how the underlying data distribution shapes the training dynamics: they rank examples based on their importance for generalization, detect noisy examples and identify subspaces of the model’s data representation that are relatively stable over training.

Surya Ganguli | Gintare Karolina Dziugaite | Mansheej Paul

[1] Vitaly Feldman,et al. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.

[2] 俊一甘利. 5分で分かる!? 有名論文ナナメ読み：Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Baharan Mirzasoleiman,et al. Coresets for Robust Training of Neural Networks against Noisy Labels , 2020, ArXiv.

[5] Baharan Mirzasoleiman,et al. Selection Via Proxy: Efficient Data Selection For Deep Learning , 2019, ICLR.

[6] Gintare Karolina Dziugaite,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.

[7] Dan Feldman,et al. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[8] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[9] David P. Woodruff,et al. On Coresets for Logistic Regression , 2018, NeurIPS.

[10] Gintare Karolina Dziugaite,et al. RelatIF: Identifying Explanatory Training Examples via Relative Influence , 2020, ArXiv.

[11] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.