Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds

We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates the network's output. Our approach is based on an importance sampling scheme that judiciously defines a sampling distribution over the neural network parameters, and as a result, retains parameters of high importance while discarding redundant ones. We leverage a novel, empirical notion of sensitivity and extend traditional coreset constructions to the application of compressing parameters. Our theoretical analysis establishes guarantees on the size and accuracy of the resulting compressed network and gives rise to generalization bounds that may provide new insights into the generalization properties of neural networks. We demonstrate the practical effectiveness of our algorithm on a variety of neural network configurations and real-world data sets.

[1]  Afshin Abdi,et al.  Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee , 2016, NIPS.

[2]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[3]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[4]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[5]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[6]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[7]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sanjiv Kumar,et al.  Binary embeddings with structured hashed projections , 2015, ICML.

[9]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[10]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[12]  L. Schulman,et al.  Universal ε-approximators for integrals , 2010, SODA '10.

[13]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Yixin Chen,et al.  Compressing Convolutional Neural Networks , 2015, ArXiv.

[17]  V. Koltchinskii,et al.  High Dimensional Probability , 2006, math/0612726.

[18]  Mathieu Salzmann,et al.  Compression-aware Training of Deep Networks , 2017, NIPS.

[19]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[21]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[22]  Yanzhi Wang,et al.  Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank , 2017, ICML.

[23]  Andreas Krause,et al.  Scalable Training of Mixture Models via Coresets , 2011, NIPS.

[24]  Abhisek Kundu,et al.  A Note on Randomized Element-wise Matrix Sparsification , 2014, ArXiv.

[25]  Roberto Cipolla,et al.  Training CNNs with Low-Rank Filters for Efficient Image Classification , 2015, ICLR.

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[28]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[30]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[31]  Ryan P. Adams,et al.  Compressibility and Generalization in Large-Scale Deep Learning , 2018, ArXiv.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[34]  Andreas Krause,et al.  Training Mixture Models at Scale via Coresets , 2017 .

[35]  R. Srikant,et al.  Why Deep Neural Networks? , 2016, ArXiv.

[36]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[37]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[38]  Kasper Green Larsen,et al.  Optimality of the Johnson-Lindenstrauss Lemma , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[39]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[40]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[41]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[42]  Dimitris Achlioptas,et al.  Matrix Entry-wise Sampling : Simple is Best [ Extended Abstract ] , 2013 .

[43]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[44]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[45]  Benjamin Doerr,et al.  Probabilistic Tools for the Analysis of Randomized Optimization Heuristics , 2018, Theory of Evolutionary Computation.

[46]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[49]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[50]  Petros Drineas,et al.  A note on element-wise matrix sparsification via a matrix-valued Bernstein inequality , 2010, Inf. Process. Lett..

[51]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[52]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Kristian Kersting,et al.  Core Dependency Networks , 2018, AAAI.

[54]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[55]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[56]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[57]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[58]  Raphael Yuster,et al.  Fast sparse matrix multiplication , 2004, TALG.

[59]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[60]  Alexander Munteanu,et al.  Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms , 2017, KI - Künstliche Intelligenz.

[61]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[62]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[63]  Vladimir Braverman,et al.  New Frameworks for Offline and Streaming Coreset Constructions , 2016, ArXiv.

[64]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[65]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.