Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers
暂无分享,去创建一个
Niclas Jansson | Artur Podobas | Steven W. D. Chien | Martin Svedin | Gibson Chikafa | Artur Podobas | Niclas Jansson | Martin Svedin | Gibson Chikafa
[1] Matt Martineau,et al. Benchmarking the NVIDIA V100 GPU and Tensor Cores , 2018, Euro-Par Workshops.
[2] Laszlo Gyongyosi,et al. A Survey on quantum computing technology , 2019, Comput. Sci. Rev..
[3] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[4] Mark Bohr,et al. A 30 Year Retrospective on Dennard's MOSFET Scaling Paper , 2007, IEEE Solid-State Circuits Newsletter.
[5] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[6] Michael J. Flynn,et al. Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.
[7] Gu-Yeon Wei,et al. Benchmarking TPU, GPU, and CPU Platforms for Deep Learning , 2019, ArXiv.
[8] Terry Cojean,et al. Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations , 2020, 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).
[9] Satoshi Matsuoka,et al. Evaluating high-level design strategies on FPGAs for high-performance computing , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[10] Zhongliang Chen,et al. NUPAR: A Benchmark Suite for Modern GPU Architectures , 2015, ICPE.
[11] Satoshi Matsuoka,et al. From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era , 2016, Conf. Computing Frontiers.
[12] Shaohuai Shi,et al. Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training , 2019, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).
[13] Jeffrey S. Vetter,et al. NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[14] Massimiliano Fatica,et al. Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[15] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[16] Satoshi Matsuoka,et al. Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws? , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[17] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[18] Kentaro Sano,et al. A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective , 2020, IEEE Access.
[19] Jack Choquette,et al. NVIDIA A100 GPU: Performance & Innovation for GPU Computing , 2020, 2020 IEEE Hot Chips 32 Symposium (HCS).
[20] Terry Cojean,et al. Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations , 2020, ArXiv.
[21] Mats Brorsson,et al. Empowering OpenMP with automatically generated hardware , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).
[22] Xu Liu,et al. Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[23] Catherine D. Schuman,et al. A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.
[24] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[25] Satoshi Matsuoka,et al. Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches? , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[26] Christian Plessl,et al. High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[27] R. Schaller,et al. Moore's law: past, present and future , 1997 .
[28] Jason Helge Anderson,et al. LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems , 2013, TECS.
[29] Dhabaleswar K. Panda,et al. OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters , 2012, EuroMPI.