论文信息 - Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Deep learning training accesses vast amounts of data at high velocity, posing bandwidth challenges for datasets retrieved over commodity networks and storage devices. A common approach to reduce bandwidth involves resizing or compressing data prior to training. We introduce a way to dynamically reduce the overhead of fetching and transporting data with a method we term Progressive Compressed Records (PCRs). PCRs deviate from previous storage formats by combining progressive compression with an efficient on-disk layout to view a single dataset at multiple fidelities---all without adding to the total dataset size. We implement PCRs and evaluate them on a range of datasets: ImageNet, HAM10000, Stanford Cars, and CelebA-HQ. Our results show that: (i) the amount of compression a dataset can tolerate depends on the training task, and (ii) PCRs enable tasks to readily access appropriate levels of compression at runtime---resulting in a 2x speedup in training time on average over baseline formats.

George Amvrosiadis | Michael Kuchnik | Virginia Smith

[1] Zhou Wang,et al. Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[2] Xin Zhang,et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[3] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[4] Matt Welsh,et al. Flywheel: Google's Data Compression Proxy for the Mobile Web , 2015, NSDI.

[5] Dan Feldman,et al. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[6] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[7] Kyuyeon Hwang,et al. Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[8] Gregory Shakhnarovich,et al. Examining the Impact of Blur on Recognition by Convolutional Networks , 2016, ArXiv.

[9] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[10] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11] Jie Xu,et al. DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[12] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[14] Mor Harchol-Balter,et al. Performance Modeling and Design of Computer Systems: Contents , 2013 .

[15] Minsoo Rhu,et al. Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16] Seunghak Lee,et al. Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[17] Jason Yosinski,et al. Faster Neural Networks Straight from JPEG , 2018, NeurIPS.

[18] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19] David P. Woodruff. Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[20] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[21] Yuhui Xu,et al. Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[22] Amar Phanishayee,et al. TBD: Benchmarking and Analyzing Deep Neural Network Training , 2018, ArXiv.

[23] Luc Van Gool,et al. Towards Image Understanding from Deep Compression without Decoding , 2018, ICLR.

[24] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[25] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.

[26] Peng Huang,et al. TerseCades: Efficient Data Compression in Stream Processing , 2018, USENIX Annual Technical Conference.

[27] Yoram Singer,et al. Short and Deep: Sketching and Neural Networks , 2017, ICLR.

[28] Rama Chellappa,et al. DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size , 2016, ArXiv.

[29] Kate Saenko,et al. Fine-to-coarse knowledge transfer for low-res image classification , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[30] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31] Marcin Zukowski,et al. Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32] Jorge-Arnulfo Quiané-Ruiz,et al. Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[33] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.

[34] Alexei A. Efros,et al. Dataset Distillation , 2018, ArXiv.

[35] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[36] Rozenn Dahyot,et al. On using CNN with DCT based Image Data , 2017 .

[37] J. Little. A Proof for the Queuing Formula: L = λW , 1961 .

[38] Xi Wang,et al. Customizing Progressive JPEG for Efficient Image Storage , 2017, HotStorage.

[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Christopher M. Bishop,et al. Current address: Microsoft Research, , 2022 .

[41] Ali Raza Butt,et al. Pricing Games for Hybrid Object Stores in the Cloud: Provider vs. Tenant , 2015, HotStorage.

[42] Gregory K. Wallace,et al. The JPEG still picture compression standard , 1992 .

[43] Masafumi Yamazaki,et al. Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.

[44] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.

[45] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[46] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[47] D. Rosenthal,et al. The Economics of Long-Term Digital Storage , 2012 .

[48] George E. Dahl,et al. Faster Neural Network Training with Data Echoing , 2019, ArXiv.

[49] Philip H. S. Torr,et al. On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50] Edo Liberty,et al. Discrepancy, Coresets, and Sketches in Machine Learning , 2019, COLT.

[51] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[52] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[54] Prabhat,et al. Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[55] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[56] Alexander J. Smola,et al. Linear support vector machines via dual cached loops , 2012, KDD.

[57] Andreas Krause,et al. Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[58] Zoubin Ghahramani,et al. A study of the effect of JPG compression on adversarial images , 2016, ArXiv.

[59] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[60] Onur Mutlu,et al. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[61] Hyeontaek Lim,et al. 3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning , 2018, MLSys.

[62] Pavan Balaji,et al. Scalable Deep Learning via I/O Analysis and Optimization , 2019, TOPC.

[63] Yannis Papakonstantinou,et al. HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics , 2016, Proc. VLDB Endow..

[64] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[65] Wonyong Sung,et al. Fixed point optimization of deep convolutional neural networks for object recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[66] Hongxia Jin,et al. Deep Neural Network Approximation using Tensor Sketching , 2017, ArXiv.

[67] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.

[68] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.

[69] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[70] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[71] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[72] Harald Kittler,et al. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018, Scientific Data.

[73] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[74] Daniel J. Abadi,et al. Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[75] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[76] Onur Mutlu,et al. Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[77] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.

[78] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[79] Stefan Wermter,et al. Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks , 2018, Int. J. Comput. Intell. Appl..

[80] Tao Wang,et al. Image Classification at Supercomputer Scale , 2018, ArXiv.

[81] Carlos Maltzahn,et al. Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[82] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[83] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[84] Touradj Ebrahimi,et al. The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[85] Lina J. Karam,et al. Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[86] Edo Liberty,et al. Simple and deterministic matrix sketching , 2012, KDD.

[87] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[88] Heiko Schwarz,et al. Overview of the Scalable Video Coding Extension of the H.264/AVC Standard , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[89] Zhi Wang,et al. AdaCompress: Adaptive Compression for Online Computer Vision Services , 2019, ACM Multimedia.