Tools for machine-learning-based empirical autotuning and specialization

The process of empirical autotuning results in the generation of many code variants which are tested, found to be suboptimal, and discarded. By retaining annotated performance profiles of each variant tested over the course of many autotuning runs of the same code across different hardware environments and different input datasets, we can apply machine learning algorithms to generate classifiers for runtime selection of code variants from a library, generate specialized variants, and potentially speed the process of autotuning by starting the search from a point predicted to be close to optimal. In this paper, we show how the TAU Performance System suite of tools can be applied to autotuning to enable reuse of performance data generated through autotuning.

[1]  Jie Wang,et al.  Optimizing MPI Runtime Parameter Settings by Using Machine Learning , 2009, PVM/MPI.

[2]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[3]  P. Sadayappan,et al.  Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[4]  Chun Chen,et al.  Auto-tuning full applications: A case study , 2011, Int. J. High Perform. Comput. Appl..

[5]  Chun Chen,et al.  Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[8]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[9]  Chun Chen,et al.  A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.

[10]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[11]  Archana Ganapathi,et al.  A case for machine learning to optimize multicore performance , 2009 .

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[14]  Allen D. Malony,et al.  An experimental approach to performance measurement of heterogeneous parallel applications using CUDA , 2010, ICS '10.

[15]  Ananta Tiwari,et al.  Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[16]  Richard W. Vuduc,et al.  Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization , 2009, LCPC.

[17]  Ananta Tiwari,et al.  End-to-End Auto-Tuning with Active Harmony , 2010 .

[18]  Allen D. Malony,et al.  Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[19]  P. Sadayappan,et al.  Optimal loop unrolling for GPGPU programs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[20]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[21]  Vahid Tabatabaee,et al.  Tuning parallel applications in parallel , 2009, Parallel Comput..

[22]  Allen D. Malony,et al.  ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis , 2003, Euro-Par.

[23]  D. Qainlant,et al.  ROSE: Compiler Support for Object-Oriented Frameworks , 1999 .

[24]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[25]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[26]  Allen D. Malony,et al.  PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[27]  Bernd Mohr,et al.  A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[28]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.