Expression Tree Evaluation by Dynamic Code Generation - Are Accelerators Up for the Task?
暂无分享,去创建一个
Josef Weidendorfer | Thomas Müller | Andreas Blaszczyk | A. Blaszczyk | Thomas Müller | J. Weidendorfer
[1] R. Bartlett,et al. Coupled-cluster methods that include connected quadruple excitations, T4: CCSDTQ-1 and Q(CCSDT) , 1989 .
[2] Mihály Kállay,et al. Coupled-cluster methods including noniterative corrections for quadruple excitations. , 2005, The Journal of chemical physics.
[3] R. Bartlett,et al. The coupled‐cluster single, double, and triple excitation model for open‐shell single reference functions , 1990 .
[4] Bryan Carpenter,et al. ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.
[5] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[6] Dorothea Heiss-Czedik,et al. An Introduction to Genetic Algorithms. , 1997, Artificial Life.
[7] P. Deuflhard. Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms , 2011 .
[8] J. Hammond,et al. Coupled‐Cluster Calculations for Large Molecular and Extended Systems , 2011 .
[9] R. Bartlett. Coupled-cluster approach to molecular structure and spectra: a step toward predictive quantum chemistry , 1989 .
[10] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[11] Dominik Grewe,et al. Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation , 2011, GPGPU-4.
[12] Robert J. Harrison,et al. Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.
[13] Thomas Müller,et al. Convergence behaviour of coupled pressure and thermal networks , 2014 .
[14] Tjalling J. Ypma,et al. Historical Development of the Newton-Raphson Method , 1995, SIAM Rev..
[15] Shahid H. Bokhari,et al. On the Mapping Problem , 1981, IEEE Transactions on Computers.
[16] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[17] R. Bartlett,et al. The full CCSDT model for molecular electronic structure , 1987 .
[18] Michael Steffen Oliver Franz,et al. Code_generation On_the_fly: a Key to Portable Software , 1994 .
[19] R. Bartlett,et al. Recursive intermediate factorization and complete computational linearization of the coupled-cluster single, double, triple, and quadruple excitation equations , 1991 .
[20] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[21] N. Oliphant,et al. Coupled‐cluster method truncated at quadruples , 1991 .
[22] Joel H. Saltz,et al. An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications , 1995, IEEE Trans. Parallel Distributed Syst..
[23] Sven Leyffer,et al. Heuristic static load-balancing algorithm applied to the fragment molecular orbital method , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[24] M. Head‐Gordon,et al. A fifth-order perturbation comparison of electron correlation theories , 1989 .
[25] S. J. Cole,et al. Towards a full CCSDT model for electron correlation , 1985 .
[26] Scott B. Baden,et al. Run-Time Support for Multi-tier Programming of Block-Structured Applications on SMP Clusters , 1997, ISCOPE.
[27] Sriram Krishnamoorthy,et al. Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[28] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[29] Paolo Bientinesi,et al. Performance Modeling for Dense Linear Algebra , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[30] R. Bartlett,et al. Coupled‐cluster open‐shell analytic gradients: Implementation of the direct product decomposition approach in energy gradient calculations , 1991 .
[31] R. Bartlett,et al. A direct product decomposition approach for symmetry exploitation in many-body methods. I. Energy calculations , 1991 .
[32] John Aycock,et al. A brief history of just-in-time , 2003, CSUR.
[33] R. Bartlett,et al. An efficient way to include connected quadruple contributions into the coupled cluster method , 1998 .
[34] Sriram Krishnamoorthy,et al. Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[35] J. Stanton. Why CCSD(T) works: a different perspective , 1997 .
[36] R. Bartlett,et al. A full coupled‐cluster singles and doubles model: The inclusion of disconnected triples , 1982 .
[37] Sriram Krishnamoorthy,et al. Performance characterization of global address space applications: a case study with NWChem , 2012, Concurr. Comput. Pract. Exp..
[38] Guy L. Steele. Debunking the “expensive procedure call” myth or, procedure call implementations considered harmful or, LAMBDA: The Ultimate GOTO , 1977, ACM '77.
[39] S. Hirata. Tensor Contraction Engine: Abstraction and Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories , 2003 .
[40] T. Crawford,et al. An Introduction to Coupled Cluster Theory for Computational Chemists , 2007 .
[41] Don W. Warren,et al. An analysis of a logical machine using parenthesis-free notation , 1954 .
[42] Joseph Edwards. An Elementary Treatise on the Differential Calculus: With Applications and Numerous Examples , 2010 .
[43] Sally A. McKee,et al. Performance optimization by dynamic code transformation , 2011, CF '11.
[44] David E. Bernholdt,et al. Automatic code generation for many-body electronic structure methods: the tensor contraction engine , 2006 .
[45] Scott B. Baden,et al. Efficient Run-Time Support for Irregular Block-Structured Applications , 1998, J. Parallel Distributed Comput..
[46] R. Bartlett,et al. The coupled‐cluster single, double, triple, and quadruple excitation method , 1992 .
[47] Ronald L. Graham,et al. Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.
[48] Mihály Kállay,et al. Approximate treatment of higher excitations in coupled-cluster theory. , 2005, The Journal of chemical physics.
[49] Courtenay T. Vaughan,et al. Zoltan data management services for parallel dynamic applications , 2002, Comput. Sci. Eng..
[50] J. Cizek. On the Correlation Problem in Atomic and Molecular Systems. Calculation of Wavefunction Components in Ursell-Type Expansion Using Quantum-Field Theoretical Methods , 1966 .
[51] R. Bartlett,et al. Coupled-cluster theory in quantum chemistry , 2007 .
[52] Robert J. Harrison,et al. Portable tools and applications for parallel computers , 1991 .
[53] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[54] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.
[55] James Demmel,et al. Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.