Making Sense of Performance Counter Measurements on Supercomputing Applications
暂无分享,去创建一个
James C. Browne | Martin Burtscher | Byoung-Do Kim | John D. McCalpin | Stephen W. Keckler | Jeff Diamond | S. Keckler | Martin Burtscher | J. Browne | Jeff Diamond | Byoung-Do Kim
[1] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[2] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[3] Vijay Janapa Reddi,et al. PIN: a binary instrumentation tool for computer architecture research and education , 2004, WCAE '04.
[4] Michael Lang,et al. Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Nathan R. Tallent,et al. Binary analysis for measurement and attribution of program performance , 2009, PLDI '09.
[6] Samuel Williams,et al. Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..
[7] Matthias Hauswirth,et al. We have it easy, but do we have it right? , 2008, 2008 IEEE International Symposium on Workload Characterization.
[8] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.
[9] Ramkumar Jayaseelan,et al. Investigating the impact of code generation on performance characteristics of integer programs , 2010, INTERACT-14.
[10] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Nathan R. Tallent,et al. Diagnosing performance bottlenecks in emerging petascale applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[12] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[13] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org , 2010 .
[14] J. Hack,et al. Description of the NCAR Community Climate Model (CCM1) , 1987 .
[15] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[16] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[17] J. Dongarra,et al. The Impact of Multicore on Computational Science Software , 2007 .