Performance Evaluation and Optimization of HBM-Enabled GPU for Data-Intensive Applications
暂无分享,去创建一个
Wenguang Chen | Chao Wang | Yuan Xie | Youwei Zhuo | Maohua Zhu | Yuan Xie | Wenguang Chen | Youwei Zhuo | Chao Wang | Maohua Zhu
[1] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[2] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[3] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[4] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[5] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[6] R. Sindhu Reddy,et al. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2018 .
[7] H. Howie Huang,et al. Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Ah Chung Tsoi,et al. Ranking Attack Graphs with Graph Neural Networks , 2009, ISPEC.
[9] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[10] Maya Gokhale,et al. Hardware Technologies for High-Performance Data-Intensive Computing , 2008, Computer.
[11] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[13] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[14] Yuan Xie,et al. Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface , 2013, TACO.
[15] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[16] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[17] Andrew S. Grimshaw,et al. High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..
[18] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.