Domain‐specific virtual processors as a portable programming and execution model for parallel computational workloads on modern heterogeneous high‐performance computing architectures

[1]  Beverly A. Sanders,et al.  Super instruction architecture of petascale electronic structure software: the story , 2010 .

[2]  Al Geist,et al.  Heterogeneous parallel and distributed computing , 1999, Parallel Comput..

[3]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[4]  Beverly A. Sanders,et al.  The Super Instruction Architecture: A Framework for High-Productivity Parallel Implementation of Coupled-Cluster Methods on Petascale Computers , 2011 .

[5]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[6]  James Demmel,et al.  Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[7]  Daniel Etiemble,et al.  Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language , 2014, International Journal of Parallel Programming.

[8]  Benoît Meister,et al.  The Open Community Runtime: A runtime system for extreme scale computing , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  John F. Stanton,et al.  A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..

[10]  A Marek,et al.  The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.

[11]  Robert A. van de Geijn,et al.  BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..

[12]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[13]  Beverly A. Sanders,et al.  Exploiting GPUs with the Super Instruction Architecture , 2014, International Journal of Parallel Programming.

[14]  G Van ZeeField,et al.  BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015 .

[15]  R J Bartlett,et al.  Parallel implementation of electronic structure energy, gradient, and Hessian calculations. , 2008, The Journal of chemical physics.