Small-GAN: Speeding Up GAN Training Using Core-sets

Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of 'real' images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected activations at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it allows GANs to reach a new state of the art in anomaly detection.

[1]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[3]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[4]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[5]  José Bento,et al.  Generative Adversarial Active Learning , 2017, ArXiv.

[6]  Trevor Darrell,et al.  Discriminator Rejection Sampling , 2018, ICLR.

[7]  Augustus Odena,et al.  Open Questions about Generative Adversarial Networks , 2019, Distill.

[8]  Chuan Sheng Foo,et al.  Efficient GAN-Based Anomaly Detection , 2018, ArXiv.

[9]  Fabián A. Chudak,et al.  Near-optimal solutions to large-scale facility location problems , 2005, Discret. Optim..

[10]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[11]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[12]  Lars M. Mescheder,et al.  On the convergence properties of GAN training , 2018, ArXiv.

[13]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[14]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[15]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[16]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[17]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[18]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[19]  Rameshwar Pratap,et al.  Faster Coreset Construction for Projective Clustering via Low-Rank Approximation , 2018, IWOCA.

[20]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[21]  Vladimir Braverman,et al.  Data-Independent Neural Pruning via Coresets , 2020, ICLR.

[22]  Yi Zhang,et al.  Do GANs learn the distribution? Some Theory and Empirics , 2018, ICLR.

[23]  Jerry Li,et al.  Towards Understanding the Dynamics of Generative Adversarial Networks , 2017, ArXiv.

[24]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[25]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Ian J. Goodfellow,et al.  Skill Rating for Generative Models , 2018, ArXiv.

[27]  Yong Yu,et al.  Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[28]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[29]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[30]  A. J. Goldman Optimal Center Location in Simple Networks , 1971 .

[31]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[32]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[34]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[35]  Sridhar Mahadevan,et al.  Generative Multi-Adversarial Networks , 2016, ICLR.

[36]  Jascha Sohl-Dickstein,et al.  Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..

[37]  Franziska Abend,et al.  Facility Location Concepts Models Algorithms And Case Studies , 2016 .

[38]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Jeff A. Bilmes,et al.  Using Document Summarization Techniques for Speech Data Subset Selection , 2013, NAACL.

[40]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[41]  Andreas Krause,et al.  Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[42]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[44]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[45]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[46]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[47]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[48]  Sariel Har-Peled,et al.  Smaller Coresets for k-Median and k-Means Clustering , 2005, SCG.

[49]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[50]  Bernd Girod,et al.  What's wrong with mean-squared error? , 1993 .

[51]  Paul S. Fisher,et al.  Image quality measures and their performance , 1995, IEEE Trans. Commun..

[52]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[53]  Vladimir Braverman,et al.  On Activation Function Coresets for Network Pruning , 2019, ArXiv.

[54]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[55]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[57]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[58]  Tatjana Chavdarova,et al.  SGAN: An Alternative Training of Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Jinoh Kim,et al.  A survey of deep learning-based network anomaly detection , 2017, Cluster Computing.

[60]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[62]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[63]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[64]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[65]  David M. Blei,et al.  Prescribed Generative Adversarial Networks , 2019, ArXiv.

[66]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[67]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[68]  Ted K. Ralphs,et al.  Integer and Combinatorial Optimization , 2013 .

[69]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[70]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[71]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[72]  Andreas Krause,et al.  Training Gaussian Mixture Models at Scale via Coresets , 2017, J. Mach. Learn. Res..

[73]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[74]  Honglak Lee,et al.  Consistency Regularization for Generative Adversarial Networks , 2020, ICLR.