Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
暂无分享,去创建一个
[1] Isaac D. Scherson,et al. Computationally Efficient Multiplexing of Events on Hardware Counters , 2014 .
[2] Matthias S. Müller,et al. The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.
[3] Shirley Moore,et al. Non-determinism and overcount on modern hardware performance counter implementations , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[4] Allen D. Malony,et al. Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs , 2011, 2011 International Conference on Parallel Processing.
[5] Richard W. Vuduc,et al. Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU) , 2012, Synthesis Lectures on Computer Architecture.
[6] Sudhakar Yalamanchili,et al. A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[7] David M. Brooks,et al. ISA-independent workload characterization and its implications for specialized architectures , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[8] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[9] John M. Mellor-Crummey,et al. Effective sampling-driven performance tools for GPU-accelerated supercomputers , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[10] Ian Karlin,et al. LULESH Programming Model and Performance Ports Overview , 2012 .
[11] Guido Juckeland,et al. Non-intrusive Performance Analysis of Parallel Hardware Accelerated Applications on Hybrid Architectures , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[12] Hyesoon Kim,et al. Performance Analysis and Tuning for General Purpose Graphics Processing Units , 2012 .
[13] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[14] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[15] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[16] Allen D. Malony,et al. Design and Implementation of a Hybrid Parallel Performance Measurement System , 2010, 2010 39th International Conference on Parallel Processing.
[17] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..