Exploiting GPUs with the Super Instruction Architecture

The Super Instruction Architecture (SIA) is a parallel programming environment designed for problems in computational chemistry involving complicated expressions defined in terms of tensors. Tensors are represented by multidimensional arrays which are typically very large. The SIA consists of a domain specific programming language, Super Instruction Assembly Language (SIAL), and its runtime system, Super Instruction Processor. An important feature of SIAL is that algorithms are expressed in terms of blocks (or tiles) of multidimensional arrays rather than individual floating point numbers. In this paper, we describe how the SIA was enhanced to exploit GPUs, obtaining speedups ranging from two to nearly four for computational chemistry calculations, thus saving hours of elapsed time on large-scale computations. The results provide evidence that the “programming-with-blocks” approach embodied in the SIA will remain successful in modern, heterogeneous computing environments.

[1]  Bronis R. de Supinski,et al.  OpenMP for Accelerators , 2011, IWOMP.

[2]  Margo McCall,et al.  IEEE Computer Society , 2019, Encyclopedia of Software Engineering.

[3]  R J Bartlett,et al.  Parallel implementation of electronic structure energy, gradient, and Hessian calculations. , 2008, The Journal of chemical physics.

[4]  Rudolf Eigenmann,et al.  OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Sriram Krishnamoorthy,et al.  GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems. , 2011, Journal of chemical theory and computation.

[6]  Sriram Krishnamoorthy,et al.  Optimizing tensor contraction expressions for hybrid CPU-GPU execution , 2013, Cluster Computing.

[7]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[8]  Oreste Villa,et al.  Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems. , 2013, Journal of chemical theory and computation.

[9]  A Eugene DePrince,et al.  Coupled Cluster Theory on Graphics Processing Units I. The Coupled Cluster Doubles Method. , 2011, Journal of chemical theory and computation.

[10]  Seyong Lee,et al.  Early evaluation of directive-based GPU programming models for productive exascale computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Beverly A. Sanders,et al.  Super instruction architecture of petascale electronic structure software: the story , 2010 .

[12]  Beverly A. Sanders,et al.  SIPMaP: A Tool for Modeling Irregular Parallel Computations in the Super Instruction Architecture , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[13]  Beverly A. Sanders,et al.  A Block-Oriented Language and Runtime System for Tensor Algebra with Very Large Arrays , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.