Hotness- and Lifetime-Aware Data Placement and Migration for High-Performance Deep Learning on Heterogeneous Memory Systems
暂无分享,去创建一个
Woongki Baek | Myeonggyun Han | Jihoon Hyun | Seongbeom Park | Woongki Baek | Myeonggyun Han | Jihoon Hyun | Seongbeom Park
[1] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[2] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[3] Ian Miguel,et al. The Temporal Knapsack Problem and Its Solution , 2005, CPAIOR.
[4] Kyu Yeun Kim,et al. BLPP: Improving the Performance of GPGPUs with Heterogeneous Memory through Bandwidth- and Latency-Aware Page Placement , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).
[5] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Aamer Jaleel,et al. ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Mainak Chaudhuri,et al. Near-Optimal Access Partitioning for Memory Hierarchies with Multiple Heterogeneous Bandwidth Sources , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[9] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[10] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[11] Hao Wang,et al. DUANG: Fast and lightweight page migration in asymmetric memory systems , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[12] Sabela Ramos,et al. Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[13] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[15] Srinivas Devadas,et al. Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[18] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[19] Lizy Kurian John,et al. A Case for Granularity Aware Page Migration , 2018, ICS.
[20] Gu-Yeon Wei,et al. Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[21] Vivien Quéma,et al. Large Pages May Be Harmful on NUMA Systems , 2014, USENIX Annual Technical Conference.
[22] Xiaowei Li,et al. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[23] Stephen W. Keckler,et al. Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.
[24] Thomas F. Wenisch,et al. High-Performance Transactions for Persistent Memories , 2016, ASPLOS.
[25] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[26] Amar Phanishayee,et al. Gist: Efficient Data Encoding for Deep Neural Network Training , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Zi Yan,et al. Nimble Page Management for Tiered Memory Systems , 2019, ASPLOS.
[29] Woongki Baek,et al. Design and implementation of bandwidth-aware memory placement and migration policies for heterogeneous memory systems , 2017, ICS '17.
[30] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.
[31] Sanjay Kumar,et al. System software for persistent memory , 2014, EuroSys '14.
[32] Minsoo Rhu,et al. Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[34] Xiaoming Chen,et al. moDNN: Memory optimal DNN training on GPUs , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[35] Aamer Jaleel,et al. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[36] Ludmila Cherkasova,et al. ProfDP: A Lightweight Profiler to Guide Data Placement in Heterogeneous Memory Systems , 2018, ICS.
[37] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[38] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.