NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs
暂无分享,去创建一个
Oreste Villa | Stephen W. Keckler | Mark Stephenson | David Nellans | S. Keckler | M. Stephenson | Oreste Villa | D. Nellans
[1] Edward McLellan. The Alpha AXP architecture and 21064 processor , 1993, IEEE Micro.
[2] Robert Hundt,et al. HP Caliper: a framework for performance analysis tools , 2000, IEEE Concurr..
[3] Kim M. Hazelwood,et al. A dynamic binary instrumentation engine for the ARM architecture , 2006, CASES '06.
[4] Dong Li,et al. Classifying soft error vulnerabilities in extreme-Scale scientific applications using a binary instrumentation tool , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[6] Karsten Schwan,et al. A framework for dynamically instrumenting GPU compute applications within GPU Ocelot , 2011, GPGPU-4.
[7] B. Jacob,et al. CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .
[8] Jin Huang,et al. Decoding CUDA Binary , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[9] Jong-Deok Choi,et al. Accurate, efficient, and adaptive calling context profiling , 2006, PLDI '06.
[10] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[11] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[12] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[13] Matthias Hauswirth,et al. Low-overhead memory leak detection using adaptive statistical profiling , 2004, ASPLOS XI.
[14] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.
[15] Sudhakar Yalamanchili,et al. Modeling GPU-CPU workloads and systems , 2010, GPGPU-3.
[16] David W. Nellans,et al. Flexible software profiling of GPU architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[17] Wu-chun Feng,et al. Towards a performance-portable FFT library for heterogeneous computing , 2014, Conf. Computing Frontiers.
[18] Derek Bruening,et al. Efficient, transparent, and comprehensive runtime code manipulation , 2004 .
[19] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[20] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[21] Yul Chu,et al. A flexible multi-core functional cache simulator (FM-SIM) , 2017, SummerSim.
[22] Bronis R. de Supinski,et al. Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation , 2013, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[23] Andrew Kerr,et al. Translating GPU Binaries to Tiered SIMD Architectures with Ocelot , 2009 .
[24] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[25] Vijay Janapa Reddi,et al. PIN: a binary instrumentation tool for computer architecture research and education , 2004, WCAE '04.
[26] Stephen W. Keckler,et al. SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[27] Larry Rudolph,et al. How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation , 2008, CC.