A performance optimization framework for compilation of tensor contraction expressions into parallel

This paper discusses a program synthesis system to facilitate the generation of high-performance parallel programs for a class of computations encountered in quantum chemistry and physics. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. An overview is provided of the synthesis system under development, that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures. Several components of the synthesis system are described, focusing on compile-time optimization issues that they address.

[1]  Mithuna Thottethodi,et al.  Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[2]  Ken Kennedy,et al.  Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries , 2001, J. Parallel Distributed Comput..

[3]  P. Kollman,et al.  Encyclopedia of computational chemistry , 1998 .

[4]  Keshav Pingali,et al.  High-level semantic optimization of numerical codes , 1999, ICS '99.

[5]  David A. Padua,et al.  A MATLAB to Fortran 90 translator and its effectiveness , 1996, ICS '96.

[6]  David E. Bernholdt,et al.  Space-time trade-off optimization for a class of electronic structure calculations , 2002, PLDI '02.

[7]  Chi-Chung Lam,et al.  Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines , 1997, PPSC.

[8]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[10]  Gerald Baumgartner,et al.  Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals , 1999, LCPC.

[11]  Chi-Chung Lam,et al.  On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution , 1997, Parallel Process. Lett..

[12]  Keshav Pingali,et al.  A case for source-level transformations in MATLAB , 1999, DSL '99.

[13]  J. Ramanujam,et al.  Loop optimization for a class of memory-constrained computations , 2001, ICS '01.

[14]  P. Schleyer Encyclopedia of computational chemistry , 1998 .

[15]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[16]  David E. Bernholdt,et al.  Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization , 2001, HiPC.

[17]  David A. Padua,et al.  Searching for the Best FFT Formulas with the SPL Compiler , 2000, LCPC.

[18]  Gustavo E. Scuseria,et al.  Achieving Chemical Accuracy with Coupled-Cluster Theory , 1995 .

[19]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[20]  Chi-Chung Lam,et al.  Performance optimization of a class of loops implementing multidimensional integrals , 1999 .