CAST: Contraction Algorithm for Symmetric Tensors
暂无分享,去创建一个
Sriram Krishnamoorthy | P. Sadayappan | Samyam Rajbhandari | Kevin Stock | Akshay Nikam | Pai-Wei Lai | Samyam Rajbhandari | S. Krishnamoorthy | P. Sadayappan | Kevin Stock | Akshay Nikam | Pai-Wei Lai
[1] Martin D. Schatz. Anatomy of Parallel Computation with Tensors FLAME Working Note # 72 Ph , 2013 .
[2] J. Ramanujam,et al. Performance modeling and optimization of parallel out-of-core tensor contractions , 2005, PPoPP.
[3] Beverly A. Sanders,et al. Software design of ACES III with the super instruction architecture , 2011 .
[4] Sriram Krishnamoorthy,et al. Scalable implementations of accurate excited-state coupled cluster theories: Application of high-level methods to porphyrin-based systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[6] David E. Bernholdt,et al. Space-time trade-off optimization for a class of electronic structure calculations , 2002, PLDI '02.
[7] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[8] Tjerk P. Straatsma,et al. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..
[9] James Demmel,et al. Improving communication performance in dense linear algebra via topology aware collectives , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[10] Sriram Krishnamoorthy,et al. A Communication-Optimal Framework for Contracting Distributed Tensors , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[12] T. Crawford,et al. An Introduction to Coupled Cluster Theory for Computational Chemists , 2007 .
[13] J. Ramanujam,et al. Global communication optimization for tensor contraction expressions under memory constraints , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[14] Mark S. Gordon,et al. Chapter 41 – Advances in electronic structure theory: GAMESS a decade later , 2005 .
[15] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[16] James Demmel,et al. Communication-Avoiding Parallel Strassen: Implementation and performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] J. Ramanujam,et al. Loop optimization for a class of memory-constrained computations , 2001, ICS '01.
[18] Kwang S. Kim,et al. Theory and applications of computational chemistry : the first forty years , 2005 .
[19] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[20] James Demmel,et al. Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[21] R. Bartlett,et al. Coupled-cluster theory in quantum chemistry , 2007 .
[22] Sriram Krishnamoorthy,et al. A framework for load balancing of Tensor Contraction expressions via dynamic task partitioning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[23] David E. Bernholdt,et al. Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations , 2005, International Conference on Computational Science.