Templating and Automatic Code Generation for Performance with Python

Parameterizing source code for architecture-bound optimization is a common approach to highperformance programming but one that makes the programmer’s task arduous and the resulting code difficult to maintain. Certain parameterizations, such as changing loop order, may require elaborate code instrumenting that distract from the main objective. In this paper, we propose a templating and automatic code generation approach based on standard Python modules and the Opal library for algorithm optimization. Advantages of our approach include its programmatic simplicity and the flexibility offered by the templating engine. We provide a complete example for the matrix multiply where optimization with respect to blocking, loop unrolling and compiler flags takes place.

[1]  Jack J. Dongarra,et al.  High Performance Development for High End Computing With Python Language Wrapper (PLW) , 2007, Int. J. High Perform. Comput. Appl..

[2]  Charles Audet,et al.  Algorithmic Parameter Optimization of the DFO Method with the OPAL Framework , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.

[3]  Charles Audet,et al.  Mesh Adaptive Direct Search Algorithms for Constrained Optimization , 2006, SIAM J. Optim..

[4]  CHARLES AUDET,et al.  Finding Optimal Algorithmic Parameters Using Derivative-Free Optimization , 2006, SIAM J. Optim..

[5]  Sébastien Le Digabel,et al.  Algorithm xxx : NOMAD : Nonlinear Optimization with the MADS algorithm , 2010 .

[6]  John A. Wisniewski,et al.  Algorithm 594: Software for Relative Error Analysis , 1983, TOMS.

[7]  Chi-Bang Kuan,et al.  Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.

[8]  Charles Audet,et al.  OrthoMADS: A Deterministic MADS Instance with Orthogonal Directions , 2008, SIAM J. Optim..

[9]  Markus Schordan,et al.  A Source-to-Source Architecture for User-Defined Optimizations , 2003, JMLC.

[10]  J. Demmel,et al.  Sun Microsystems , 1996 .

[11]  Richard W. Vuduc,et al.  POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  David L. Spooner,et al.  Algorithm 532: software for roundoff analysis [Z] , 1978, TOMS.

[13]  Webb Miller Software for Roundoff Analysis , 1975, TOMS.

[14]  Ahmed H. Sameh,et al.  Algorithms for roundoff error analysis —A relative error approach , 1980, Computing.

[15]  Jack J. Dongarra,et al.  A comparison of search heuristics for empirical code optimization , 2008, 2008 IEEE International Conference on Cluster Computing.

[16]  David A. Padua,et al.  A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.