Fast Computational GPU Design with GT-Pin
暂无分享,去创建一个
Harish Patil | Chi-Keung Luk | Melanie Kambadur | Martha A. Kim | Sunpyo Hong | Juan Cabral | Sohaib Sajid | C. Luk | H. Patil | Sunpyo Hong | Melanie Kambadur | S. Sajid | Juan Cabral
[1] Brad Calder,et al. Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.
[2] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[3] Hai Jin,et al. GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation , 2015, IEEE Transactions on Computers.
[4] Ruppa K. Thulasiram,et al. Option Pricing on the GPU , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).
[5] James Cownie,et al. PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.
[6] Lieven Eeckhout,et al. BarrierPoint: Sampled simulation of multi-threaded applications , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7] Brad Calder,et al. SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.
[8] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[9] David Defour,et al. Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.
[10] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[11] Wolfgang Paul,et al. GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model , 2009, J. Comput. Phys..
[12] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[13] Karsten Schwan,et al. Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[14] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .
[15] Won Woo Ro,et al. Parallel GPU architecture simulation framework exploiting work allocation unit parallelism , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[16] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[17] Hsien-Hsin S. Lee,et al. TBPoint: Reducing Simulation Time for Large-Scale GPGPU Kernels , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[18] David R. Kaeli,et al. Analyzing program flow within a many-kernel OpenCL application , 2011, GPGPU-4.
[19] Rajiv Kapoor,et al. Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[20] Steve Mann,et al. Computer vision signal processing on graphics processing units , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[21] Gagan Agrawal,et al. A translation system for enabling data mining applications on GPUs , 2009, ICS.
[22] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).