AARTS: low overhead online adaptive auto-tuning

We present an online lightweight auto-tuning system for shared-memory parallel programs. We employ an online adaptive tuning algorithm that is based on performance measurements, to adapt to performance variability that arises during program execution. We address the impact of synchronous vs. asynchronous interactions between the application and the tuning system, and describe an adaptive approach that benefits from the improvements provided by both options. We presented a performance study of the online tuning system, and compared it to synchronous tuning systems. Finally, AARTS is evaluated under different scenarios, showing the potential benefits of using online tuning and the ability of AARTS to exploit those benefits.

[1]  R. C. Whaley,et al.  Timing high performance kernels through empirical compilation , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[2]  J. Ramanujam,et al.  DynTile: Parametric tiled loop generation for parallel execution on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[4]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[5]  Richard W. Vuduc,et al.  Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization , 2009, LCPC.

[6]  Santosh Pande,et al.  Input-driven dynamic execution prediction of streaming applications , 2010, PPoPP '10.

[7]  Rudolf Eigenmann,et al.  Experiences in Using Cetus for Source-to-Source Transformations , 2004, LCPC.

[8]  Jeffrey S. Vetter,et al.  Autopilot: adaptive control of distributed applications , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[9]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[10]  Markus Mock,et al.  DyC: an expressive annotation-directed dynamic compiler for C , 2000, Theor. Comput. Sci..

[11]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[12]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[13]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[14]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[15]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[16]  Tomàs Margalef,et al.  MATE: Monitoring, Analysis and Tuning Environment for parallel/distributed applications: Research Articles , 2007 .

[17]  Ümit V. Çatalyürek,et al.  Optimizing dataflow applications on heterogeneous environments , 2010, Cluster Computing.

[18]  Steven G. Johnson,et al.  The Fastest Fourier Transform in the West , 1997 .

[19]  Chun Chen,et al.  Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.

[20]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[21]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[22]  Rudolf Eigenmann,et al.  Automatically Tuning Parallel and Parallelized Programs , 2009, LCPC.

[23]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[24]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..