Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator
暂无分享,去创建一个
David W. Nellans | Daniel Lustig | Niladrish Chatterjee | Oreste Villa | Evgeny Bolotin | Yaosheng Fu | Zi Yan | Nan Jiang | Niladrish Chatterjee | E. Bolotin | Yaosheng Fu | Daniel Lustig | Oreste Villa | Nan Jiang | D. Nellans | Zi Yan | Evgeny Bolotin
[1] Matthew Poremba,et al. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[2] Hsien-Hsin S. Lee,et al. GPUMech: GPU Performance Modeling Technique Based on Interval Analysis , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[3] Xun Gong,et al. Multi2Sim Kepler: A detailed architectural GPU simulator , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[4] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] Alois Knoll,et al. A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[6] Brad Calder,et al. Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[7] Rajeev Balasubramonian,et al. Managing DRAM Latency Divergence in Irregular GPGPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[9] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[10] Sean White,et al. ‘Zeppelin’: An SoC for multichip architectures , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).
[11] Tor M. Aamodt,et al. Emerald: Graphics Modeling for SoC Systems , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[12] C. Li,et al. The Demand for a Sound Baseline in GPU Memory Architecture Research , 2017 .
[13] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.
[14] Sarita V. Adve,et al. HeteroSync: A benchmark suite for fine-grained synchronization on tightly coupled GPUs , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[15] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[16] Karthikeyan Sankaralingam,et al. Your favorite simulator here " Considered Harmful , 2014 .
[17] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[18] Stephen W. Keckler,et al. Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.
[19] David W. Nellans,et al. Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[20] KimHyesoon,et al. An integrated GPU power and performance model , 2010 .
[21] Aamer Jaleel,et al. Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] David Patterson,et al. MLPerf Training Benchmark , 2019, MLSys.
[23] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[24] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[25] William J. Dally,et al. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[27] David W. Nellans,et al. Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[28] Simone Secchi,et al. Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer , 2012, IEEE Transactions on Parallel and Distributed Systems.
[29] B. Jacob,et al. CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .
[30] Thomas F. Wenisch,et al. Selective GPU caches to eliminate CPU-GPU HW cache coherence , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[31] Akshay Jain,et al. A Quantitative Evaluation of Contemporary GPU Simulation Methodology , 2018, SIGMETRICS.
[32] Louai Alarabi. Summit , 2018, SIGSPATIAL Special.
[33] Aamer Jaleel,et al. Beyond the Socket: NUMA-Aware GPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[34] Xiaojin Zhu,et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[35] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[36] Shunfei Chen,et al. MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[37] Carole-Jean Wu,et al. MCM-GPU: Multi-chip-module GPUs for continued performance scalability , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[38] Lieven Eeckhout,et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[39] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[40] Derek Chiou,et al. GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[41] Seth H. Pugsley,et al. USIMM : the Utah SImulated Memory Module , 2012 .
[42] Nan Jiang,et al. A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[43] Thomas F. Wenisch,et al. SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.
[44] David A. Wood,et al. Full-system timing-first simulation , 2002, SIGMETRICS '02.
[45] Aamer Jaleel,et al. HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[46] Tor M. Aamodt,et al. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[47] Carlos González,et al. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[48] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[49] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[50] Oreste Villa,et al. NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs , 2019, MICRO.
[51] Bruce Jacob,et al. DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator , 2020, IEEE Computer Architecture Letters.
[52] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).