论文信息 - Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Deep learning training accesses vast amounts of data at high velocity, posing bandwidth challenges for datasets retrieved over commodity networks and storage devices. A common approach to reduce bandwidth involves resizing or compressing data prior to training. We introduce a way to dynamically reduce the overhead of fetching and transporting data with a method we term Progressive Compressed Records (PCRs). PCRs deviate from previous storage formats by combining progressive compression with an efficient on-disk layout to view a single dataset at multiple fidelities---all without adding to the total dataset size. We implement PCRs and evaluate them on a range of datasets: ImageNet, HAM10000, Stanford Cars, and CelebA-HQ. Our results show that: (i) the amount of compression a dataset can tolerate depends on the training task, and (ii) PCRs enable tasks to readily access appropriate levels of compression at runtime---resulting in a 2x speedup in training time on average over baseline formats.

[1] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[2] Christopher M. Bishop,et al. Current address: Microsoft Research, , 2022 .

[3] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[4] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[5] Xin Zhang,et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[6] Yuhui Xu,et al. Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[7] Tao Wang,et al. Image Classification at Supercomputer Scale , 2018, ArXiv.

[8] Sai Narasimhamurthy,et al. Characterizing Deep-Learning I/O Workloads in TensorFlow , 2018, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS).

[9] Onur Mutlu,et al. Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Dan Feldman,et al. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[12] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.

[13] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[14] Onur Mutlu,et al. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[15] Kate Saenko,et al. Fine-to-coarse knowledge transfer for low-res image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[16] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[18] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[19] D. Rosenthal,et al. The Economics of Long-Term Digital Storage , 2012 .

[20] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.

[21] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[22] Bertil Schmidt,et al. Massively Parallel Huffman Decoding on GPUs , 2018, ICPP.

[23] Peng Huang,et al. TerseCades: Efficient Data Compression in Stream Processing , 2018, USENIX Annual Technical Conference.

[24] Philip H. S. Torr,et al. On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26] Muthian Sivathanu,et al. Quiver: An Informed Storage Cache for Deep Learning , 2020, FAST.

[27] Bin Zhou,et al. Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[28] Yoram Singer,et al. Short and Deep: Sketching and Neural Networks , 2017, ICLR.

[29] Eirikur Agustsson,et al. High-Fidelity Generative Image Compression , 2020, NeurIPS.

[30] Johannes Ballé,et al. Efficient Nonlinear Transforms for Lossy Image Compression , 2018, 2018 Picture Coding Symposium (PCS).

[31] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[35] Rama Chellappa,et al. DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size , 2016, ArXiv.

[36] Mehmet Deveci,et al. Exploring the limits of Concurrency in ML Training on Google TPUs , 2020, ArXiv.

[37] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[38] Gustavo Alonso,et al. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning , 2019, Proc. VLDB Endow..

[39] Zhi Wang,et al. AdaCompress: Adaptive Compression for Online Computer Vision Services , 2019, ACM Multimedia.

[40] Luc Van Gool,et al. Towards Image Understanding from Deep Compression without Decoding , 2018, ICLR.

[41] Zhou Wang,et al. Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[42] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[43] Shmuel Tomi Klein,et al. Parallel Huffman Decoding with Applications to JPEG Files , 2003, Comput. J..

[44] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Eric P. Xing,et al. High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Ali Raza Butt,et al. Pricing Games for Hybrid Object Stores in the Cloud: Provider vs. Tenant , 2015, HotStorage.

[47] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[48] Jiri Simsa,et al. tf.data: A Machine Learning Data Processing Framework , 2021, Proc. VLDB Endow..

[49] Keith Winstein,et al. Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol , 2018, NSDI.

[50] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[51] D. Huffman. A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[52] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[53] Lina J. Karam,et al. Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[54] Zoubin Ghahramani,et al. A study of the effect of JPG compression on adversarial images , 2016, ArXiv.

[55] Alexander J. Smola,et al. Linear support vector machines via dual cached loops , 2012, KDD.

[56] David Minnen,et al. Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Carlos Maltzahn,et al. Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[58] Mor Harchol-Balter,et al. Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[59] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[60] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[61] Gregory Shakhnarovich,et al. Examining the Impact of Blur on Recognition by Convolutional Networks , 2016, ArXiv.

[62] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[63] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[64] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[65] Yannis Papakonstantinou,et al. HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics , 2016, Proc. VLDB Endow..

[66] Tao Wang,et al. Scale MLPerf-0.6 models on Google TPU-v3 Pods , 2019, ArXiv.

[67] Naman Agarwal,et al. Stochastic Optimization with Laggard Data Pipelines , 2020, NeurIPS.

[68] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[69] Matt Welsh,et al. Flywheel: Google's Data Compression Proxy for the Mobile Web , 2015, NSDI.

[70] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[71] David P. Woodruff. Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[72] Jie Xu,et al. DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[73] Edo Liberty,et al. Simple and deterministic matrix sketching , 2012, KDD.

[74] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[75] Gu-Yeon Wei,et al. A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms , 2020, MLSys.

[76] J D Littler,et al. A PROOF OF THE QUEUING FORMULA , 1961 .

[77] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[78] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[79] Weikuan Yu,et al. I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning , 2019, ICPP.

[80] Hang Su,et al. Benchmarking Adversarial Robustness on Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81] Heiko Schwarz,et al. Overview of the Scalable Video Coding Extension of the H.264/AVC Standard , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[82] Andreas Krause,et al. Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[83] Jack Moffitt. Ogg Vorbis—Open, Free Audio—Set Your Media Free , 2001 .

[84] Hongxia Jin,et al. Deep Neural Network Approximation using Tensor Sketching , 2017, ArXiv.

[85] Jason Yosinski,et al. Faster Neural Networks Straight from JPEG , 2018, NeurIPS.

[86] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[87] Zhi-Qin John Xu,et al. Training behavior of deep neural network in frequency domain , 2018, ICONIP.

[88] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.

[89] Marcin Zukowski,et al. Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[90] Ekin D. Cubuk,et al. A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[91] Thomas Breuel,et al. High Performance I/O For Large Scale Deep Learning , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[92] E. Eleftheriou,et al. Memory devices and applications for in-memory computing , 2020, Nature Nanotechnology.

[93] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[94] David Patterson,et al. A domain-specific supercomputer for training deep neural networks , 2020, Commun. ACM.

[95] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.

[96] Alexei A. Efros,et al. Dataset Distillation , 2018, ArXiv.

[97] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[98] Kyuyeon Hwang,et al. Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[99] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[100] Hyeontaek Lim,et al. 3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning , 2018, MLSys.

[101] Amar Phanishayee,et al. TBD: Benchmarking and Analyzing Deep Neural Network Training , 2018, ArXiv.

[102] Rozenn Dahyot,et al. On using CNN with DCT based Image Data , 2017 .

[103] Travis E. Oliphant,et al. Guide to NumPy , 2015 .

[104] Pavan Balaji,et al. Scalable Deep Learning via I/O Analysis and Optimization , 2019, TOPC.

[105] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[106] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[107] Harald Kittler,et al. Descriptor : The HAM 10000 dataset , a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018 .

[108] Stephen Sir King-Hall. The book of speed , 1934 .

[109] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.

[110] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[111] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[112] Amar Phanishayee,et al. Analyzing and Mitigating Data Stalls in DNN Training , 2020, Proc. VLDB Endow..

[113] Michael A. Bender,et al. File Systems Fated for Senescence? Nonsense, Says Science! , 2017, FAST.

[114] Diana Marculescu,et al. AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling , 2019, MLSys.

[115] Gregory K. Wallace,et al. The JPEG still picture compression standard , 1992 .

[116] Masafumi Yamazaki,et al. Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.

[117] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[118] Sanjay Krishnan,et al. Band-limited Training and Inference for Convolutional Neural Networks , 2019, ICML.

[119] J. Little. A Proof for the Queuing Formula: L = λW , 1961 .

[120] Xi Wang,et al. Customizing Progressive JPEG for Efficient Image Storage , 2017, HotStorage.

[121] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[122] Stefan Wermter,et al. Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks , 2018, Int. J. Comput. Intell. Appl..

[123] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[124] Daniel J. Abadi,et al. Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[125] Jorge-Arnulfo Quiané-Ruiz,et al. Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[126] George E. Dahl,et al. Faster Neural Network Training with Data Echoing , 2019, ArXiv.

[127] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.

[128] Tie-Yan Liu,et al. Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.

[129] Li Chen,et al. SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression , 2018, KDD.

[130] Jonathon Shlens,et al. A Tutorial on Principal Component Analysis , 2014, ArXiv.

[131] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[132] Seunghak Lee,et al. Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[133] Ronen Basri,et al. The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies , 2019, NeurIPS.

[134] Minsoo Rhu,et al. Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[135] David Minnen,et al. Variational image compression with a scale hyperprior , 2018, ICLR.

[136] Tao Liu,et al. Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[137] Touradj Ebrahimi,et al. The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[138] Edo Liberty,et al. Discrepancy, Coresets, and Sketches in Machine Learning , 2019, COLT.

[139] Prabhat,et al. Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[140] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[141] Wonyong Sung,et al. Fixed point optimization of deep convolutional neural networks for object recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).