AutoSCOPE : Automatic Suggestions for Code Optimizations using PerfExpert

Automated source-code performance optimization has four stages: measurement, diagnosis of bottlenecks, determination of optimizations, and rewriting of the source code. Each stage must be successfully implemented to enable the next stage. The PerfExpert tool supports automatic performance measurement and bottleneck diagnosis for multicore and multichip compute nodes, i.e., it implements the first two stages. This paper presents AutoSCOPE, a new system that extends PerfExpert by implementing the third stage. Based on PerfExpert’s output, AutoSCOPE automatically determines appropriate source-code optimizations and compiler flags. We describe the process for selecting optimizations and evaluate the effectiveness of AutoSCOPE by applying it to three HPC production codes. Each of these codes is available in unoptimized and manually optimized versions. AutoSCOPE succeeds in selecting the same source-code transformations as were chosen by human experts in most cases. AutoSCOPE is an extensible framework to which additional optimizations and further rules for selecting optimizations can be added.

[1]  John Kohn,et al.  ATExpert , 1993, J. Parallel Distributed Comput..

[2]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[3]  Tomàs Margalef,et al.  Automatic detection of parallel program performance problems , 1998, SPDT '98.

[4]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[5]  Felix Wolf,et al.  KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications , 2003 .

[6]  Rick Kufrin,et al.  PerfSuite: An Accessible, Open Source Performance Analysis Environment for Linux , 2005 .

[7]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[8]  Guojing Cong,et al.  A Productivity Centered Tools Framework for Application Performance Tuning , 2007, Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007).

[9]  Samuel Williams,et al.  PERI auto-tuning , 2008 .

[10]  Allen D. Malony,et al.  Knowledge support and automation for performance analysis with PerfExplorer 2.0 , 2008, Sci. Program..

[11]  Nathan R. Tallent,et al.  HPCToolkit: performance tools for scientific computing , 2008 .

[12]  Guojing Cong,et al.  A framework for automated performance bottleneck detection , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  Alan D. George,et al.  Parallel performance wizard: A performance analysis tool for partitioned global-address-space programming , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Keshav Pingali,et al.  Multicore Optimization for Ranger , 2009 .

[15]  Guojing Cong,et al.  A Holistic Approach towards Automated Performance Analysis and Tuning , 2009, Euro-Par.

[16]  Kristof Beyls,et al.  Refactoring for Data Locality , 2009, Computer.

[17]  Michael Ott,et al.  Automatic performance analysis with periscope , 2010, Concurr. Comput. Pract. Exp..

[18]  Lars Koesterke,et al.  PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  James C. Browne,et al.  Making Sense of Performance Counter Measurements on Supercomputing Applications , 2010 .