Automated Code Engine for Graphical Processing Units: Application to the Effective Core Potential Integrals and Gradients.

We present an automated code engine (ACE) that automatically generates optimized kernels for computing integrals in electronic structure theory on a given graphical processing unit (GPU) computing platform. The code generator in ACE creates multiple code variants with different memory and floating point operation trade-offs. A graph representation is created as the foundation of the code generation, which allows the code generator to be extended to various types of integrals. The code optimizer in ACE determines the optimal code variant and GPU configurations for a given GPU computing platform by scanning over all possible code candidates and then choosing the best-performing code candidate for each kernel. We apply ACE to the optimization of effective core potential integrals and gradients. It is observed that the best code candidate varies with differing angular momentum, floating point precision, and type of GPU being used, which shows that the ACE may be a powerful tool in adapting to fast evolving GPU architectures.

[1]  Paul Bracken,et al.  CALCULATION OF GAUSSIAN INTEGRALS USING SYMBOLIC MANIPULATION , 1997 .

[2]  Frances E. Allen Interprocedural Analysis and the Information derived by it , 1974, Programming Methodology.

[3]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[4]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[5]  Sriram Krishnamoorthy,et al.  Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions , 2012, J. Parallel Distributed Comput..

[6]  Curtis L. Janssen,et al.  The automated solution of second quantization equations with applications to the coupled cluster approach , 1991 .

[7]  John Cocke,et al.  A methodology for the real world , 1981 .

[8]  Robert A. van de Geijn,et al.  High-performance implementation of the level-3 BLAS , 2008, TOMS.

[9]  Larry McMurchie,et al.  CALCULATION OF INTEGRALS OVER AB INITIO PSEUDOPOTENTIALS , 1981 .

[10]  Trygve Helgaker,et al.  On the evaluation of derivatives of Gaussian integrals , 1992 .

[11]  John Cocke,et al.  A program data flow analysis procedure , 1976, CACM.

[12]  Jack B. Dennis,et al.  First version of a data flow procedure language , 1974, Symposium on Programming.

[13]  Karl J. Ottenstein,et al.  The program dependence graph in a software development environment , 1984, SDE 1.

[14]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[15]  Sriram Krishnamoorthy,et al.  Empirical Performance-Model Driven Data Layout Optimization , 2004, LCPC.

[16]  Koji Yasuda,et al.  Two‐electron integral evaluation on the graphics processor unit , 2008, J. Comput. Chem..

[17]  Richard P. Hopkins,et al.  Combining Data Flow and Control Flow Computing , 1982, Comput. J..

[18]  Andrew W. Appel,et al.  Modern Compiler Implementation in ML , 1997 .

[19]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[20]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[21]  Todd J. Martínez,et al.  Generating Efficient Quantum Chemistry Codes for Novel Architectures. , 2013, Journal of chemical theory and computation.

[22]  William A. Goddard,et al.  Ab Initio Effective Potentials for Use in Molecular Calculations , 1972 .

[23]  E. Davidson,et al.  One- and two-electron integrals over cartesian gaussian functions , 1978 .

[24]  Toru Shiozaki,et al.  Communication: automatic code generation enables nuclear gradient computations for fully internally contracted multireference theory. , 2015, The Journal of chemical physics.

[25]  Edward F. Valeev,et al.  Second-order Møller-Plesset theory with linear R12 terms (MP2-R12) revisited: auxiliary basis set method and massively parallel implementation. , 2004, The Journal of chemical physics.

[26]  Kenneth M. Merz,et al.  Acceleration of High Angular Momentum Electron Repulsion Integrals and Integral Derivatives on Graphics Processing Units. , 2015, Journal of chemical theory and computation.

[27]  Todd J Martínez,et al.  Efficient implementation of effective core potential integrals and gradients on graphical processing units. , 2015, The Journal of chemical physics.

[28]  P. Hoggan,et al.  Molecular Integrals over Slater-type Orbitals . From pioneers to recent progress , 2009 .

[29]  Brett M. Bode,et al.  Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units. , 2010, Journal of chemical theory and computation.

[30]  Ove Christiansen,et al.  Automatic derivation and evaluation of vibrational coupled cluster theory equations. , 2009, The Journal of chemical physics.

[31]  Sandro Chiodo,et al.  Determination of spin‐orbit coupling contributions in the framework of density functional theory , 2008, J. Comput. Chem..

[32]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation. , 2009, Journal of chemical theory and computation.

[33]  Herbert W. Jones,et al.  Computer‐generated formulas for overlap integrals of slater‐type orbitals , 1980 .

[34]  Herbert W. Jones,et al.  Computer-generated formulas for some three-center molecular integrals over Slater-type orbitals , 1983 .

[35]  Sriram Krishnamoorthy,et al.  Performance optimization of tensor contraction expressions for many-body methods in quantum chemistry. , 2009, The journal of physical chemistry. A.

[36]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 3. Analytical Energy Gradients, Geometry Optimization, and First Principles Molecular Dynamics. , 2009, Journal of chemical theory and computation.

[37]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[38]  Gregory J. Chaitin,et al.  Register allocation & spilling via graph coloring , 1982, SIGPLAN '82.

[39]  Frederick R. Manby,et al.  Automatic code generation in density functional theory , 2001 .

[40]  John Cocke,et al.  Register Allocation Via Coloring , 1981, Comput. Lang..