Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Deep learning training accesses vast amounts of data at high velocity, posing bandwidth challenges for datasets retrieved over commodity networks and storage devices. A common approach to reduce bandwidth involves resizing or compressing data prior to training. We introduce a way to dynamically reduce the overhead of fetching and transporting data with a method we term Progressive Compressed Records (PCRs). PCRs deviate from previous storage formats by combining progressive compression with an efficient on-disk layout to view a single dataset at multiple fidelities---all without adding to the total dataset size. We implement PCRs and evaluate them on a range of datasets: ImageNet, HAM10000, Stanford Cars, and CelebA-HQ. Our results show that: (i) the amount of compression a dataset can tolerate depends on the training task, and (ii) PCRs enable tasks to readily access appropriate levels of compression at runtime---resulting in a 2x speedup in training time on average over baseline formats.

[1]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[2]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[3]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[4]  Matt Welsh,et al.  Flywheel: Google's Data Compression Proxy for the Mobile Web , 2015, NSDI.

[5]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[6]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[7]  Kyuyeon Hwang,et al.  Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[8]  Gregory Shakhnarovich,et al.  Examining the Impact of Blur on Recognition by Convolutional Networks , 2016, ArXiv.

[9]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[10]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11]  Jie Xu,et al.  DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Contents , 2013 .

[15]  Minsoo Rhu,et al.  Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Seunghak Lee,et al.  Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[17]  Jason Yosinski,et al.  Faster Neural Networks Straight from JPEG , 2018, NeurIPS.

[18]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[20]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[21]  Yuhui Xu,et al.  Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[22]  Amar Phanishayee,et al.  TBD: Benchmarking and Analyzing Deep Neural Network Training , 2018, ArXiv.

[23]  Luc Van Gool,et al.  Towards Image Understanding from Deep Compression without Decoding , 2018, ICLR.

[24]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[25]  Pengtao Xie,et al.  Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.

[26]  Peng Huang,et al.  TerseCades: Efficient Data Compression in Stream Processing , 2018, USENIX Annual Technical Conference.

[27]  Yoram Singer,et al.  Short and Deep: Sketching and Neural Networks , 2017, ICLR.

[28]  Rama Chellappa,et al.  DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size , 2016, ArXiv.

[29]  Kate Saenko,et al.  Fine-to-coarse knowledge transfer for low-res image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[30]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[33]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[34]  Alexei A. Efros,et al.  Dataset Distillation , 2018, ArXiv.

[35]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[36]  Rozenn Dahyot,et al.  On using CNN with DCT based Image Data , 2017 .

[37]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[38]  Xi Wang,et al.  Customizing Progressive JPEG for Efficient Image Storage , 2017, HotStorage.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[41]  Ali Raza Butt,et al.  Pricing Games for Hybrid Object Stores in the Cloud: Provider vs. Tenant , 2015, HotStorage.

[42]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[43]  Masafumi Yamazaki,et al.  Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.

[44]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[45]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[46]  Yuanzhou Yang,et al.  Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[47]  D. Rosenthal,et al.  The Economics of Long-Term Digital Storage , 2012 .

[48]  George E. Dahl,et al.  Faster Neural Network Training with Data Echoing , 2019, ArXiv.

[49]  Philip H. S. Torr,et al.  On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Edo Liberty,et al.  Discrepancy, Coresets, and Sketches in Machine Learning , 2019, COLT.

[51]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[52]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[54]  Prabhat,et al.  Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[55]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[56]  Alexander J. Smola,et al.  Linear support vector machines via dual cached loops , 2012, KDD.

[57]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[58]  Zoubin Ghahramani,et al.  A study of the effect of JPG compression on adversarial images , 2016, ArXiv.

[59]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[60]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[61]  Hyeontaek Lim,et al.  3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning , 2018, MLSys.

[62]  Pavan Balaji,et al.  Scalable Deep Learning via I/O Analysis and Optimization , 2019, TOPC.

[63]  Yannis Papakonstantinou,et al.  HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics , 2016, Proc. VLDB Endow..

[64]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[65]  Wonyong Sung,et al.  Fixed point optimization of deep convolutional neural networks for object recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[66]  Hongxia Jin,et al.  Deep Neural Network Approximation using Tensor Sketching , 2017, ArXiv.

[67]  Ji Liu,et al.  Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.

[68]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[69]  Eric P. Xing,et al.  GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[70]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[71]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[72]  Harald Kittler,et al.  The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018, Scientific Data.

[73]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[74]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[75]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[76]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[77]  James Demmel,et al.  ImageNet Training in Minutes , 2017, ICPP.

[78]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[79]  Stefan Wermter,et al.  Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks , 2018, Int. J. Comput. Intell. Appl..

[80]  Tao Wang,et al.  Image Classification at Supercomputer Scale , 2018, ArXiv.

[81]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[82]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[83]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[84]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[85]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[86]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[87]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[88]  Heiko Schwarz,et al.  Overview of the Scalable Video Coding Extension of the H.264/AVC Standard , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[89]  Zhi Wang,et al.  AdaCompress: Adaptive Compression for Online Computer Vision Services , 2019, ACM Multimedia.