Performance characterization of global address space applications: a case study with NWChem
暂无分享,去创建一个
Sriram Krishnamoorthy | Allen D. Malony | Sameer Shende | Jeff R. Hammond | Nichols A. Romero | N. A. Romero | J. Hammond | S. Krishnamoorthy | A. Malony | S. Shende
[1] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[2] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[3] M. Head‐Gordon,et al. A fifth-order perturbation comparison of electron correlation theories , 1989 .
[4] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[5] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..
[6] Trygve Helgaker,et al. Molecular Electronic-Structure Theory: Helgaker/Molecular Electronic-Structure Theory , 2000 .
[7] Jeff R. Hammond,et al. Coupled-cluster response theory: parallel algorithms and novel applications , 2009 .
[8] Mark S. Gordon,et al. Parallel algorithm for integral transformations and GUGA MCSCF , 1994 .
[9] Robert J. Harrison,et al. Liquid water: obtaining the right answer for the right reasons , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[10] Péter Kacsuk,et al. Distributed and parallel systems: from instruction parallelism to cluster computing , 2000 .
[11] Rick Kufrin. Measuring and improving application performance with PerfSuite , 2005 .
[12] Bernd Mohr,et al. A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[13] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[14] Guy L. Steele,et al. Parallel Programming and Parallel Abstractions in Fortress , 2005, IEEE PACT.
[15] Allen D. Malony,et al. Design and Implementation of a Hybrid Parallel Performance Measurement System , 2010, 2010 39th International Conference on Parallel Processing.
[16] Ibm Blue,et al. Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..
[17] Michael J. Frisch,et al. An improved criterion for evaluating the efficiency of two-electron integral algorithms , 1993 .
[18] Robyn R. Lutz,et al. Generalized portable shmem library for high performance computing , 2003 .
[19] Jürgen Gauss,et al. Parallel Calculation of CCSD and CCSD(T) Analytic First and Second Derivatives. , 2008, Journal of chemical theory and computation.
[20] Bernd Mohr,et al. The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..
[21] Robert J. Harrison,et al. Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.
[22] Robert J. Harrison,et al. Computational chemistry at the petascale: Are we there yet? , 2009 .
[23] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[24] Allen D. Malony,et al. Instrumentation and Measurement Strategies for Flexible and Portable Empirical Performance Evaluation , 2001 .
[25] Michael J. Frisch,et al. Ab Initio Quantum Chemistry on a Workstation Cluster , 1995 .
[26] R J Bartlett,et al. Parallel implementation of electronic structure energy, gradient, and Hessian calculations. , 2008, The Journal of chemical physics.
[27] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[28] R. Bartlett,et al. Recursive intermediate factorization and complete computational linearization of the coupled-cluster single, double, triple, and quadruple excitation equations , 1991 .
[29] Kaivalya M. Dixit,et al. The SPEC benchmarks , 1991, Parallel Comput..
[30] Jeffrey S. Vetter,et al. Enabling a highly-scalable global address space model for petascale computing , 2010, CF '10.
[31] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[32] Bryan Carpenter,et al. ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.
[33] Robert J. Harrison,et al. Parallel direct four-index transformations , 1996 .
[34] Jarek Nieplocha,et al. Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit , 2006, Int. J. High Perform. Comput. Appl..
[35] Sriram Krishnamoorthy,et al. Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[36] Dieter Kranzlmüller,et al. Tools for Scalable Parallel Program Analysis - Vampir VNG and DeWiz , 2004, DAPSYS.
[37] Robert J. Fowler,et al. HPCToolkit : Multi-platform Tools for Profile-based Performance Analysis , 2003 .
[38] Philip Heidelberger,et al. The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.
[39] Allen D. Malony,et al. Performance Technology for Complex Parallel and Distributed Systems , 2000, Scalable Comput. Pract. Exp..
[40] Katherine Yelick,et al. UPC Language Specifications V1.1.1 , 2003 .
[41] Allen D. Malony,et al. Portable profiling and tracing for parallel, scientific applications using C++ , 1998, SPDT '98.
[42] Alistair P. Rendell,et al. A direct coupled cluster algorithm for massively parallel computers , 1997 .
[43] M. Ratner. Molecular electronic-structure theory , 2000 .
[44] Guy L. Steele. Parallel Programming and Parallel Abstractions in Fortress , 2005, IEEE PACT.
[45] Mark S. Gordon,et al. Parallel algorithm for integral transformations and GUGA MCSCF , 1994 .
[46] Peter M. W. Gill,et al. Molecular integrals Over Gaussian Basis Functions , 1994 .
[47] J. Hammond,et al. Coupled‐Cluster Calculations for Large Molecular and Extended Systems , 2011 .
[48] Robert J. Harrison,et al. Parallel computing in quantum chemistry - Message passing and beyond for a general ab initio program system , 1994, Future generations computer systems.
[49] R. Harrison,et al. AB Initio Molecular Electronic Structure on Parallel Computers , 1994 .
[50] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[51] Mark S. Gordon,et al. Coupled cluster algorithms for networks of shared memory parallel processors , 2007, Comput. Phys. Commun..
[52] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[53] Robert J. Harrison,et al. Moving beyond message passing. Experiments with a distributed-data model , 1993 .