System-wide Introspection for Accurate Attribution of Performance Bottlenecks
暂无分享,去创建一个
Anirban Mandal | A. Mandal | Robert Fowler RENCI | Allan Porterfield RENCI | A. Renci | Robert Fowler Renci
[1] Jeffrey S. Vetter,et al. A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications , 2001, WOMPAT.
[2] Robert G. Edwards,et al. The Chroma Software System for Lattice QCD , 2004 .
[3] Jeffrey S. Vetter,et al. Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.
[4] Douglas Thain,et al. Qthreads: An API for programming with millions of lightweight threads , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[5] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.
[6] B.P. Miller,et al. MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[7] Martin Schulz,et al. Scalable load-balance measurement for SPMD codes , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Bronis R. de Supinski,et al. A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries , 2010, IWOMP.
[9] S. Eranian. Perfmon2: a flexible performance monitoring interface for Linux , 2010 .
[10] Daniel Bedard,et al. PowerMon: Fine-grained and integrated power monitoring for commodity computer systems , 2010, Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon).
[11] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[13] Lars Koesterke,et al. PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Samuel Williams,et al. Lattice Boltzmann simulation optimization on leading multicore platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[15] Barton P. Miller,et al. Tree-based overlay networks for scalable applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[16] Barton P. Miller,et al. The Paradyn Parallel Performance Measurement Tool , 1995, Computer.
[17] Robert J. Fowler,et al. Modeling memory concurrency for multi-socket multi-core systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[18] Karsten Schwan,et al. Falcon: On-line monitoring for steering parallel programs , 1998, Concurr. Pract. Exp..
[19] Alejandro Duran,et al. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.
[20] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[21] Wolfgang E. Nagel,et al. VAMPIR: Visualization and Analysis of MPI Resources , 2010 .