Dataset Condensation with Gradient Matching

Efficient training of deep neural networks is an increasingly important problem in the era of sophisticated architectures and large-scale datasets. This paper proposes a training set synthesis technique, called Dataset Condensation, that learns to produce a small set of informative samples for training deep neural networks from scratch in a small fraction of the required computational cost on the original data while achieving comparable results. We rigorously evaluate its performance in several computer vision benchmarks and show that it significantly outperforms the state-of-the-art methods. Finally we show promising applications of our method in continual learning and domain adaptation.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[4]  Ameet Talwalkar,et al.  Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[5]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[6]  Bo Zhao,et al.  iDLG: Improved Deep Leakage from Gradients , 2020, ArXiv.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[9]  Quinn Jones,et al.  Few-Shot Adversarial Domain Adaptation , 2017, NIPS.

[10]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Marshall F. Tappen,et al.  Learning optimized MAP estimates in continuously-valued MRF models , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Reza Zanjirani Farahani,et al.  Facility location: concepts, models, algorithms and case studies , 2009 .

[13]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[14]  Adrian Popescu,et al.  ScaIL: Classifier Weights Scaling for Class Incremental Learning , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[16]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Marshall F. Tappen,et al.  Learning optimized MAP estimates in continuously-valued MRF models , 2009, CVPR.

[20]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[21]  Yandong Guo,et al.  Large Scale Incremental Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaodong Cui,et al.  English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.

[23]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[24]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[25]  Yoshua Bengio,et al.  A Generative Process for sampling Contractive Auto-Encoders , 2012, ICML 2012.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Thad Starner,et al.  Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.

[29]  Matthias Schonlau,et al.  Soft-Label Dataset Distillation and Text Dataset Distillation , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).

[30]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[32]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[33]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[34]  Ramazan Gokberk Cinbis,et al.  Gradient Matching Generative Networks for Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yoshua Bengio,et al.  An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.

[36]  Song Han,et al.  Deep Leakage from Gradients , 2019, NeurIPS.

[37]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[38]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[39]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[40]  Joel Lehman,et al.  Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data , 2019, ICML.

[41]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[42]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[43]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[45]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[46]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[47]  Franziska Abend,et al.  Facility Location Concepts Models Algorithms And Case Studies , 2016 .

[48]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[51]  R. Venkatesh Babu,et al.  Zero-Shot Knowledge Distillation in Deep Networks , 2019, ICML.

[52]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[53]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[54]  Yongxin Yang,et al.  Flexible Dataset Distillation: Learn Labels Instead of Images , 2020, ArXiv.

[55]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[56]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[57]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[58]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[59]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[60]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[61]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[62]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[64]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[65]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[66]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[67]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[68]  Alexei A. Efros,et al.  Dataset Distillation , 2018, ArXiv.

[69]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[70]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.