CSWAP: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs
暂无分享,去创建一个
Xian-He Sun | Yanlong Yin | Xuechen Zhang | Ping Chen | Shuibing He | Shuaiben Chen | Peiyi Hong | Gang Chen | Yanlong Yin | Peiyi Hong | Xuechen Zhang | Shuibing He | Xian-He Sun | Gang Chen | Ping Chen | Shuaiben Chen
[1] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.
[2] David A. Landgrebe,et al. A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..
[3] Venkatesh Akella,et al. AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming , 2020, ASPLOS.
[4] D. Madigan,et al. Bayesian Model Averaging for Linear Regression Models , 1997 .
[5] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[6] Minjia Zhang,et al. Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning , 2019, ArXiv.
[7] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[9] Amar Phanishayee,et al. Gist: Efficient Data Encoding for Deep Neural Network Training , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[10] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[11] A. H. Robinson,et al. Results of a prototype television bandwidth compression scheme , 1967 .
[12] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Yun Liang,et al. REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs , 2019, FPGA.
[14] Xiaoming Chen,et al. moDNN: Memory optimal DNN training on GPUs , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[15] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Kiyokuni Kawachiya,et al. TFLMS: Large Model Support in TensorFlow by Graph Rewriting , 2018, ArXiv.
[17] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.
[18] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[20] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[21] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.
[22] Yi Tay,et al. Deep Learning based Recommender System: A Survey and New Perspectives , 2018 .
[23] Hans Hagen,et al. An Introduction to Tensors , 2006, Visualization and Processing of Tensor Fields.
[24] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[25] Stephen W. Keckler,et al. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Xian-He Sun,et al. AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator , 2021, ICS.
[27] Wonyong Sung,et al. Fixed point optimization of deep convolutional neural networks for object recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Shih-Fu Chang,et al. An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[29] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[32] Gu Jin,et al. SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping , 2020, ASPLOS.
[33] William Stafford Noble,et al. Support vector machine , 2013 .
[34] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[35] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Zhen Zhang,et al. Is Network the Bottleneck of Distributed Training? , 2020, NetAI@SIGCOMM.
[37] John Thomson,et al. COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors , 2020, CGO.
[38] Kyuyeon Hwang,et al. Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).
[39] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[40] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[41] P. Frazier. Bayesian Optimization , 2018, Hyperparameter Optimization in Machine Learning.
[42] Ashwak Alabaichi,et al. A Novel Compressing a Sparse Matrix using Folding Technique , 2017 .
[43] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[44] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.
[45] Jidong Zhai,et al. Collaborative Heterogeneity-Aware OS Scheduler for Asymmetric Multicore Processors , 2021, IEEE Transactions on Parallel and Distributed Systems.
[46] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[47] Hai Jin,et al. Capuchin: Tensor-based GPU Memory Management for Deep Learning , 2020, ASPLOS.
[48] Purushottam Kulkarni,et al. Dynamic Memory Management for GPU-Based Training of Deep Neural Networks , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).