Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling
暂无分享,去创建一个
[1] Xun Gong,et al. Multi2Sim Kepler: A detailed architectural GPU simulator , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[2] Hsien-Hsin S. Lee,et al. GPUMech: GPU Performance Modeling Technique Based on Interval Analysis , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[3] Ronald G. Dreslinski,et al. Sources of error in full-system simulation , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[4] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[6] K. Pagiamtzis,et al. A low-power content-addressable memory (CAM) using pipelined hierarchical search scheme , 2004, IEEE Journal of Solid-State Circuits.
[7] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[8] Wu-chun Feng,et al. Bounding the effect of partition camping in GPU kernels , 2011, CF '11.
[9] Lizy Kurian John,et al. The virtual write queue: coordinating DRAM and last-level cache policies , 2010, ISCA.
[10] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[11] Scott A. Mahlke,et al. Mascar: Speeding up GPU warps by reducing memory pitstops , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[12] Yingwei Luo,et al. Get Out of the Valley: Power-Efficient Address Mapping for GPUs , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[13] Karthikeyan Sankaralingam,et al. Architectural Simulators Considered Harmful , 2015, IEEE Micro.
[14] C. Li,et al. The Demand for a Sound Baseline in GPU Memory Architecture Research , 2017 .
[15] Carlos González,et al. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[16] Nam Sung Kim,et al. Approximating warps with intra-warp operand value similarity , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[17] Xiaojin Zhu,et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Doug Burger,et al. Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.
[19] Sarita V. Adve,et al. Chasing Away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] Mattan Erez,et al. A locality-aware memory hierarchy for energy-efficient GPU architectures , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[22] David W. Nellans,et al. Flexible software profiling of GPU architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[23] Oreste Villa,et al. NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs , 2019, MICRO.
[24] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[25] Olivier Giroux,et al. Volta: Performance and Programmability , 2018, IEEE Micro.
[26] Tor M. Aamodt,et al. Emerald: Graphics Modeling for SoC Systems , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[27] Margaret Martonosi,et al. MRPB: Memory request prioritization for massively parallel processors , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[28] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[29] Won Woo Ro,et al. Access pattern-aware cache management for improving data utilization in GPU , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[30] Josep Torrellas,et al. Scalable Cache Miss Handling for High Memory-Level Parallelism , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[31] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[32] Onur Mutlu,et al. DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems , 2010 .
[33] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[34] Tor M. Aamodt,et al. Analyzing Machine Learning Workloads Using a Detailed GPU Simulator , 2018, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[35] Rajeev Balasubramonian,et al. Managing DRAM Latency Divergence in Irregular GPGPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Prasun Gera,et al. Performance Characterisation and Simulation of Intel's Integrated GPU Architecture , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[37] Tor M. Aamodt,et al. Modeling Deep Learning Accelerator Enabled GPUs , 2018, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[38] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[39] A. Seznec,et al. Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[40] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[41] John S. Liptay,et al. Structural Aspects of the System/360 Model 85 II: The Cache , 1968, IBM Syst. J..
[42] Lieven Eeckhout,et al. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[43] Derek Chiou,et al. GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[44] Lieven Eeckhout,et al. Racing to Hardware-Validated Simulation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[45] Akshay Jain,et al. A Quantitative Evaluation of Contemporary GPU Simulation Methodology , 2018, SIGMETRICS.
[46] Matthew Poremba,et al. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[47] Alois Knoll,et al. A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[48] Amar Phanishayee,et al. Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[49] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[50] Rafael Hector Saavedra-Barrera,et al. CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .
[51] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[52] Hyesoon Kim,et al. An integrated GPU power and performance model , 2010, ISCA.
[53] AngryCalc. GeForce GTX TITAN , 2018 .
[54] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.