Framework for a productive performance optimization
暂无分享,去创建一个
[1] Nathan R. Tallent,et al. HPCToolkit: performance tools for scientific computing , 2008 .
[2] Juan Gonzalez,et al. On-line detection of large-scale parallel application's structure , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3] Alex Ramírez,et al. On the memory system requirements of future scientific applications: Four case-studies , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[4] James E. Smith,et al. A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.
[5] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[6] Juan Gonzalez,et al. Automatic detection of parallel applications computation phases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[7] Juan Gonzalez,et al. Performance Data Extrapolation in Parallel Codes , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.
[8] Wenguang Chen,et al. OpenUH: an optimizing, portable OpenMP compiler , 2007, Concurr. Comput. Pract. Exp..
[9] Michael Stumm,et al. Online performance analysis by statistical sampling of microprocessor performance counters , 2005, ICS '05.
[10] Interner Bericht. VAMPIR: Visualization and Analysis of MPI Resources , 1996 .
[11] Jesús Labarta,et al. Detailed Performance Analysis Using Coarse Grain Sampling , 2009, Euro-Par Workshops.
[12] Juan Gonzalez,et al. Automatic Evaluation of the Computation Structure of Parallel Applications , 2009, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies.
[13] Bernd Mohr,et al. Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications , 2008, Parallel Tools Workshop.
[14] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .
[15] Jesús Labarta,et al. Unveiling Internal Evolution of Parallel Application Computation Phases , 2011, 2011 International Conference on Parallel Processing.
[16] Pratap Pattnaik,et al. High-Performance Sorting Algorithms on AIX , 2008 .
[17] A. Mericas,et al. Workload characterization for the design of future servers , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[18] Stijn Eyerman,et al. Mechanistic-empirical processor performance modeling for constructing CPI stacks on real hardware , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[19] M. J. Astrophysik,et al. Deceleration of arbitrarily magnetized GRB ejecta: the complete evolution , 2008, 0810.2961.
[20] Sverre Jarp. A Methodology for using the Itanium-2 Performance Counters for Bottleneck Analysis , 2002 .
[21] Gokul B. Kandiraju,et al. IBM Research Report High-Performance Sorting Algorithms on AIX , 2008 .
[22] Charles Yount,et al. Using Model Trees for Computer Architecture Performance Analysis of Software Applications , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[23] Toni Cortes,et al. PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .
[24] Allen D. Malony,et al. Capturing performance knowledge for automated analysis , 2008, HiPC 2008.
[25] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[26] Cycle Accounting Analysis on Intel ® Core TM 2 Processors , .
[27] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[28] Lars Koesterke,et al. PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.