Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Deep learning training accesses vast amounts of data at high velocity, posing bandwidth challenges for datasets retrieved over commodity networks and storage devices. A common approach to reduce bandwidth involves resizing or compressing data prior to training. We introduce a way to dynamically reduce the overhead of fetching and transporting data with a method we term Progressive Compressed Records (PCRs). PCRs deviate from previous storage formats by combining progressive compression with an efficient on-disk layout to view a single dataset at multiple fidelities---all without adding to the total dataset size. We implement PCRs and evaluate them on a range of datasets: ImageNet, HAM10000, Stanford Cars, and CelebA-HQ. Our results show that: (i) the amount of compression a dataset can tolerate depends on the training task, and (ii) PCRs enable tasks to readily access appropriate levels of compression at runtime---resulting in a 2x speedup in training time on average over baseline formats.

[1]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[2]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[3]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[4]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[5]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[6]  Yuhui Xu,et al.  Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[7]  Tao Wang,et al.  Image Classification at Supercomputer Scale , 2018, ArXiv.

[8]  Sai Narasimhamurthy,et al.  Characterizing Deep-Learning I/O Workloads in TensorFlow , 2018, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS).

[9]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[12]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[13]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[14]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[15]  Kate Saenko,et al.  Fine-to-coarse knowledge transfer for low-res image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[16]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[18]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[19]  D. Rosenthal,et al.  The Economics of Long-Term Digital Storage , 2012 .

[20]  Pengtao Xie,et al.  Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.

[21]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[22]  Bertil Schmidt,et al.  Massively Parallel Huffman Decoding on GPUs , 2018, ICPP.

[23]  Peng Huang,et al.  TerseCades: Efficient Data Compression in Stream Processing , 2018, USENIX Annual Technical Conference.

[24]  Philip H. S. Torr,et al.  On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Muthian Sivathanu,et al.  Quiver: An Informed Storage Cache for Deep Learning , 2020, FAST.

[27]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[28]  Yoram Singer,et al.  Short and Deep: Sketching and Neural Networks , 2017, ICLR.

[29]  Eirikur Agustsson,et al.  High-Fidelity Generative Image Compression , 2020, NeurIPS.

[30]  Johannes Ballé,et al.  Efficient Nonlinear Transforms for Lossy Image Compression , 2018, 2018 Picture Coding Symposium (PCS).

[31]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Eric P. Xing,et al.  GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[35]  Rama Chellappa,et al.  DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size , 2016, ArXiv.

[36]  Mehmet Deveci,et al.  Exploring the limits of Concurrency in ML Training on Google TPUs , 2020, ArXiv.

[37]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[38]  Gustavo Alonso,et al.  Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning , 2019, Proc. VLDB Endow..

[39]  Zhi Wang,et al.  AdaCompress: Adaptive Compression for Online Computer Vision Services , 2019, ACM Multimedia.

[40]  Luc Van Gool,et al.  Towards Image Understanding from Deep Compression without Decoding , 2018, ICLR.

[41]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[42]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[43]  Shmuel Tomi Klein,et al.  Parallel Huffman Decoding with Applications to JPEG Files , 2003, Comput. J..

[44]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Eric P. Xing,et al.  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ali Raza Butt,et al.  Pricing Games for Hybrid Object Stores in the Cloud: Provider vs. Tenant , 2015, HotStorage.

[47]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[48]  Jiri Simsa,et al.  tf.data: A Machine Learning Data Processing Framework , 2021, Proc. VLDB Endow..

[49]  Keith Winstein,et al.  Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol , 2018, NSDI.

[50]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[51]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[52]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[53]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[54]  Zoubin Ghahramani,et al.  A study of the effect of JPG compression on adversarial images , 2016, ArXiv.

[55]  Alexander J. Smola,et al.  Linear support vector machines via dual cached loops , 2012, KDD.

[56]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[58]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[59]  Cody Coleman,et al.  MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[60]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[61]  Gregory Shakhnarovich,et al.  Examining the Impact of Blur on Recognition by Convolutional Networks , 2016, ArXiv.

[62]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[63]  Yuanzhou Yang,et al.  Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[64]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[65]  Yannis Papakonstantinou,et al.  HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics , 2016, Proc. VLDB Endow..

[66]  Tao Wang,et al.  Scale MLPerf-0.6 models on Google TPU-v3 Pods , 2019, ArXiv.

[67]  Naman Agarwal,et al.  Stochastic Optimization with Laggard Data Pipelines , 2020, NeurIPS.

[68]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[69]  Matt Welsh,et al.  Flywheel: Google's Data Compression Proxy for the Mobile Web , 2015, NSDI.

[70]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[71]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[72]  Jie Xu,et al.  DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[73]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[74]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[75]  Gu-Yeon Wei,et al.  A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms , 2020, MLSys.

[76]  J D Littler,et al.  A PROOF OF THE QUEUING FORMULA , 1961 .

[77]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[78]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[79]  Weikuan Yu,et al.  I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning , 2019, ICPP.

[80]  Hang Su,et al.  Benchmarking Adversarial Robustness on Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Heiko Schwarz,et al.  Overview of the Scalable Video Coding Extension of the H.264/AVC Standard , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[82]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[83]  Jack Moffitt Ogg Vorbis—Open, Free Audio—Set Your Media Free , 2001 .

[84]  Hongxia Jin,et al.  Deep Neural Network Approximation using Tensor Sketching , 2017, ArXiv.

[85]  Jason Yosinski,et al.  Faster Neural Networks Straight from JPEG , 2018, NeurIPS.

[86]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[87]  Zhi-Qin John Xu,et al.  Training behavior of deep neural network in frequency domain , 2018, ICONIP.

[88]  Ji Liu,et al.  Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.

[89]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[90]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[91]  Thomas Breuel,et al.  High Performance I/O For Large Scale Deep Learning , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[92]  E. Eleftheriou,et al.  Memory devices and applications for in-memory computing , 2020, Nature Nanotechnology.

[93]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[94]  David Patterson,et al.  A domain-specific supercomputer for training deep neural networks , 2020, Commun. ACM.

[95]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[96]  Alexei A. Efros,et al.  Dataset Distillation , 2018, ArXiv.

[97]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[98]  Kyuyeon Hwang,et al.  Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[99]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[100]  Hyeontaek Lim,et al.  3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning , 2018, MLSys.

[101]  Amar Phanishayee,et al.  TBD: Benchmarking and Analyzing Deep Neural Network Training , 2018, ArXiv.

[102]  Rozenn Dahyot,et al.  On using CNN with DCT based Image Data , 2017 .

[103]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[104]  Pavan Balaji,et al.  Scalable Deep Learning via I/O Analysis and Optimization , 2019, TOPC.

[105]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[106]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[107]  Harald Kittler,et al.  Descriptor : The HAM 10000 dataset , a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018 .

[108]  Stephen Sir King-Hall The book of speed , 1934 .

[109]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[110]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[111]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[112]  Amar Phanishayee,et al.  Analyzing and Mitigating Data Stalls in DNN Training , 2020, Proc. VLDB Endow..

[113]  Michael A. Bender,et al.  File Systems Fated for Senescence? Nonsense, Says Science! , 2017, FAST.

[114]  Diana Marculescu,et al.  AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling , 2019, MLSys.

[115]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[116]  Masafumi Yamazaki,et al.  Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.

[117]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[118]  Sanjay Krishnan,et al.  Band-limited Training and Inference for Convolutional Neural Networks , 2019, ICML.

[119]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[120]  Xi Wang,et al.  Customizing Progressive JPEG for Efficient Image Storage , 2017, HotStorage.

[121]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[122]  Stefan Wermter,et al.  Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks , 2018, Int. J. Comput. Intell. Appl..

[123]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[124]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[125]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[126]  George E. Dahl,et al.  Faster Neural Network Training with Data Echoing , 2019, ArXiv.

[127]  James Demmel,et al.  ImageNet Training in Minutes , 2017, ICPP.

[128]  Tie-Yan Liu,et al.  Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.

[129]  Li Chen,et al.  SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression , 2018, KDD.

[130]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[131]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[132]  Seunghak Lee,et al.  Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[133]  Ronen Basri,et al.  The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies , 2019, NeurIPS.

[134]  Minsoo Rhu,et al.  Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[135]  David Minnen,et al.  Variational image compression with a scale hyperprior , 2018, ICLR.

[136]  Tao Liu,et al.  Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[137]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[138]  Edo Liberty,et al.  Discrepancy, Coresets, and Sketches in Machine Learning , 2019, COLT.

[139]  Prabhat,et al.  Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[140]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[141]  Wonyong Sung,et al.  Fixed point optimization of deep convolutional neural networks for object recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).