暂无分享,去创建一个
Guanpeng Li | Dingwen Tao | Shuaiwen Leon Song | Sian Jin | Shuaiwen Song | Guanpeng Li | Dingwen Tao | Sian Jin | S. Song
[1] Robert Hecht-Nielsen,et al. Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.
[2] Gregory K. Wallace,et al. The JPEG still picture compression standard , 1992 .
[3] Peter Deutsch,et al. GZIP file format specification version 4.3 , 1996, RFC.
[4] Michael W. Marcellin,et al. JPEG2000 - image compression fundamentals, standards and practice , 2002, The Kluwer International Series in Engineering and Computer Science.
[5] Per Stenström,et al. A Robust Main-Memory Compression Scheme , 2005, ISCA 2005.
[6] Martin Isenburg,et al. Fast and Efficient Compression of Floating-Point Data , 2006, IEEE Transactions on Visualization and Computer Graphics.
[7] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[8] Martin Burtscher,et al. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.
[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[10] Michael W. Marcellin,et al. JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.
[11] Wei-keng Liao,et al. Data Compression for the Exascale Computing Era - Survey , 2014, Supercomput. Front. Innov..
[12] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[13] Peter Lindstrom,et al. Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.
[14] Dit-Yan Yeung,et al. Collaborative Deep Learning for Recommender Systems , 2014, KDD.
[15] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[18] Franck Cappello,et al. Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[21] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[22] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[23] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Raquel Urtasun,et al. The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.
[25] Franck Cappello,et al. Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[26] Ben H. H. Juurlink,et al. E^2MC: Entropy Encoding Based Memory Compression for GPUs , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[27] Jan Lucas,et al. E^2MC: Entropy Encoding Based Memory Compression for GPUs , 2017, IPDPS 2017.
[28] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[29] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[30] Denis Foley,et al. Ultra-Performance Pascal GPU and NVLink Interconnect , 2017, IEEE Micro.
[31] Stephen W. Keckler,et al. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[32] Fangfang Xia,et al. CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research , 2018, BMC Bioinformatics.
[33] Jun Zhu,et al. Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Franck Cappello,et al. Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets , 2018, 2018 IEEE International Conference on Big Data (Big Data).
[35] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.
[36] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..
[37] Hai Jin,et al. Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems , 2018, PPOPP.
[38] Hai Jin,et al. Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems , 2018, PPOPP.
[39] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[40] Franck Cappello,et al. Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP , 2018, IEEE Transactions on Parallel and Distributed Systems.
[41] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[42] Franck Cappello,et al. DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression , 2019, HPDC.
[43] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[44] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..
[45] Franck Cappello,et al. cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data , 2020, PACT.
[46] David W. Nellans,et al. Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[47] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[48] Tor M. Aamodt,et al. JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).