Model Compression with Generative Adversarial Networks

More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Model compression (also known as distillation) alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh data is unavailable for the compression task, the teacher's training data is typically reused, leading to suboptimal compression. In this work, we propose to augment the compression dataset with synthetic data from a generative adversarial network (GAN) designed to approximate the training data distribution. Our GAN-assisted model compression (GAN-MC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets. Building on these results, we propose a comprehensive metric---the Compression Score---to evaluate the quality of synthetic datasets based on their induced model compression performance. The Compression Score captures both data diversity and discriminability, and we illustrate its benefits over the popular Inception Score in the context of image classification.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Pierre Geurts,et al.  L1-based compression of random forest models , 2012, ESANN.

[3]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[4]  Zheng Xu,et al.  Training Student Networks for Acceleration with Conditional Adversarial Networks , 2018, BMVC.

[5]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[13]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[14]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[17]  Matthew Richardson,et al.  Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[18]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[19]  Yu Liu,et al.  MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data? , 2017 .

[20]  Saharon Rosset,et al.  Lossless (and Lossy) Compression of Random Forests , 2018, ArXiv.

[21]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[22]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[23]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[24]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[25]  Jimeng Sun,et al.  Generating Multi-label Discrete Patient Records using Generative Adversarial Networks , 2017, MLHC.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[28]  Pierre Geurts,et al.  Globally Induced Forest: A Prepruning Compression Scheme , 2017, ICML.

[29]  Dacheng Tao,et al.  Adversarial Learning of Portable Student Networks , 2018, AAAI.

[30]  Saharon Rosset,et al.  Compressing Random Forests , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[31]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[32]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[33]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.