A preliminary analysis of Cyclops Tensor Framework
暂无分享,去创建一个
[1] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[2] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[3] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[4] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[5] David E. Bernholdt,et al. Automatic code generation for many-body electronic structure methods: the tensor contraction engine , 2006 .
[6] James Demmel,et al. Minimizing Communication in Linear Algebra , 2009, ArXiv.
[7] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[8] Sriram Krishnamoorthy,et al. Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations , 2005, LCPC.
[9] Jack Dongarra,et al. ScaLAPACK user's guide , 1997 .
[10] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[11] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[12] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[13] James Demmel,et al. Improving communication performance in dense linear algebra via topology aware collectives , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[15] S. Lennart Johnsson,et al. Minimizing the Communication Time for Matrix Multiplication on Multiprocessors , 1993, Parallel Comput..
[16] David E. Bernholdt,et al. High performance computational chemistry: An overview of NWChem a distributed parallel application , 2000 .
[17] Robert J. Harrison,et al. Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.
[18] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[19] Sartaj Sahni,et al. Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..
[20] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[21] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.