暂无分享,去创建一个
David R. Kaeli | José L. Abellán | Xiang Gong | Yifan Sun | Vincent Zhao | Saiful A. Mojumder | Ajay Joshi | Shi Dong | John Kim | Rafael Ubal | Trinayan Baruah | Shane Treadway | Yuhui Bao | John Kim | D. Kaeli | A. Joshi | Yifan Sun | R. Ubal | Xiang Gong | Trinayan Baruah | Shi Dong | Yuhui Bao | Shane Treadway | Vincent Zhao
[1] Y. Lim,et al. FIR filter design over a discrete powers-of-two coefficient space , 1983 .
[2] Xiangyu Li,et al. Hetero-mark, a benchmark suite for CPU-GPU collaborative computing , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[3] Roy H. Campbell,et al. A Parallel Implementation of K-Means Clustering on GPUs , 2008, PDPTA.
[4] Larry J. Merville,et al. An Empirical Examination of the Black‐Scholes Call Option Pricing Model , 1979 .
[5] Sangpil Lee,et al. Parallel GPU Architecture Simulation Framework Exploiting Architectural-Level Parallelism with Timing Error Prediction , 2016, IEEE Transactions on Computers.
[6] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[7] Yunsi Fei,et al. Nacre: Durable, Secure and Energy-efficient Non-Volatile Memory Utilizing Data Versioning , 2019, IEEE Transactions on Emerging Topics in Computing.
[8] Aamer Jaleel,et al. Beyond the Socket: NUMA-Aware GPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] David Defour,et al. Barra, a Parallel Functional GPGPU Simulator , 2009 .
[10] Michael Garland,et al. Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[11] Hai Jiang,et al. Scaling up MapReduce-based Big Data Processing on Multi-GPU systems , 2014, Cluster Computing.
[12] Matthew Poremba,et al. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] Robert C. Martin. Agile Software Development, Principles, Patterns, and Practices , 2002 .
[14] David Kanter. GRAPHICS PROCESSING REQUIREMENTS FOR ENABLING IMMERSIVE VR , 2015 .
[15] Antonio J. Peña,et al. Chai: Collaborative heterogeneous applications for integrated-architectures , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[16] Wen-mei W. Hwu,et al. Heterogeneous System Architecture: A New Compute Platform Infrastructure , 2015 .
[17] R. M. Fujimoto,et al. Parallel discrete event simulation , 1989, WSC '89.
[18] Brian Kingsbury,et al. Very deep multilingual convolutional neural networks for LVCSR , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[20] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[21] Carole-Jean Wu,et al. MCM-GPU: Multi-chip-module GPUs for continued performance scalability , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[22] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[23] John Waldron,et al. AES Encryption Implementation and Analysis on Commodity Graphics Processing Units , 2007, CHES.
[24] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[25] Simon See,et al. An Evaluation of Unified Memory Technology on NVIDIA GPUs , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[26] Eric A. Brewer,et al. Kubernetes and the path to cloud native , 2015, SoCC.
[27] Dietmar Fey,et al. High Performance Stencil Code Algorithms for GPGPUs , 2011, ICCS.
[28] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[29] Keshav Pingali,et al. Stochastic gradient descent on GPUs , 2015, GPGPU@PPoPP.
[30] Abhinav Vishnu,et al. Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[31] Alessandro Dal Palù,et al. GPU-enhanced Finite Volume Shallow Water solver for fast flood simulations , 2014, Environ. Model. Softw..
[32] Xun Gong,et al. Multi2Sim Kepler: A detailed architectural GPU simulator , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[33] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[34] Denis Foley,et al. Ultra-Performance Pascal GPU and NVLink Interconnect , 2017, IEEE Micro.
[35] Shengen Yan,et al. Deep Image: Scaling up Image Recognition , 2015, ArXiv.
[36] Smruti R. Sarangi,et al. GpuTejas: A parallel simulator for GPU architectures , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[37] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.
[38] Brad Calder,et al. Reproducible simulation of multi-threaded workloads for architecture design exploration , 2008, 2008 IEEE International Symposium on Workload Characterization.
[39] Amnon Barak,et al. Memory access patterns: the missing piece of the multi-GPU puzzle , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Wei Wu,et al. Fast thermal simulation for architecture level dynamic thermal management , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..
[41] Won Woo Ro,et al. Parallel GPU architecture simulation framework exploiting work allocation unit parallelism , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[42] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[43] Joonyoung Kim,et al. HBM: Memory solution for bandwidth-hungry processors , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[44] David R. Kaeli,et al. UMH , 2016, ACM Trans. Archit. Code Optim..
[45] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).