Graph expansion and communication costs of fast matrix multiplication: regular submission
暂无分享,去创建一个
James Demmel | Oded Schwartz | Grey Ballard | Olga Holtz | J. Demmel | Grey Ballard | O. Schwartz | Olga Holtz
[1] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[2] Grazia Lotti,et al. O(n2.7799) Complexity for n*n Approximate Matrix Multiplication , 1979, Inf. Process. Lett..
[3] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[4] Frédéric Suter,et al. Impact of mixed‐parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms , 2004, Concurr. Pract. Exp..
[5] Alexander Tiskin. Communication-efficient parallel generic pairwise elimination , 2007, Future Gener. Comput. Syst..
[6] A. J. Stothers. On the complexity of matrix multiplication , 2010 .
[7] Dan Suciu,et al. Journal of the ACM , 2006 .
[8] Volker Strassen,et al. Algebraic Complexity Theory , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.
[9] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[10] J. R. Johnson,et al. Implementation of Strassen's Algorithm for Matrix Multiplication , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[11] Marc Snir,et al. GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .
[12] Telecommunications Board. The Future of Computing Performance: Game Over or Next Level? , 2011 .
[13] Michael Clausen,et al. Algebraic complexity theory , 1997, Grundlehren der mathematischen Wissenschaften.
[14] Victor Y. Pan,et al. New Fast Algorithms for Matrix Operations , 1980, SIAM J. Comput..
[15] Avi Wigderson,et al. Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.
[16] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[17] Franco P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.
[18] Christopher Umans. Group-theoretic algorithms for matrix multiplication , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).
[19] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[20] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[21] Michael A. Bender,et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model , 2007, SPAA '07.
[22] Shmuel Winograd,et al. On multiplication of 2 × 2 matrices , 1971 .
[23] James Demmel,et al. Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.
[24] James Demmel,et al. Communication Avoiding Gaussian elimination , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] James Demmel,et al. Brief announcement: communication bounds for heterogeneous architectures , 2011, SPAA '11.
[26] James Demmel,et al. Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..
[27] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.
[28] James Demmel,et al. Fast linear algebra is stable , 2006, Numerische Mathematik.
[29] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.
[30] Vijaya Ramachandran,et al. Cache-oblivious dynamic programming , 2006, SODA '06.
[31] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[32] Patricia J. Teller,et al. Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.
[33] James Demmel,et al. Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication , 2012, MedAlg.
[34] Frédéric Suter,et al. Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms: Research Articles , 2004 .
[35] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[36] John E. Savage. Space-Time Tradeoffs in Memory Hierarchies , 1994 .
[37] Sartaj Sahni,et al. Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..
[38] Andrea Pietracaprina,et al. On the Space and Access Complexity of Computation DAGs , 2000, WG.
[39] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[40] F. P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.
[41] A. Tiskin. Bulk-Synchronous Parallel Gaussian Elimination , 2002 .
[42] Michael A. Heroux,et al. GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm , 1994, Journal of Computational Physics.
[43] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[44] Milena Mihail,et al. Conductance and convergence of Markov chains-a combinatorial treatment of expanders , 1989, 30th Annual Symposium on Foundations of Computer Science.
[45] John E. Savage. Extending the Hong-Kung Model to Memory Hierarchies , 1995, COCOON.
[46] David S. Wise. Ahnentafel Indexing into Morton-Ordered Arrays, or Matrix Locality for Free , 2000, Euro-Par.
[47] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.
[48] Francesco Romani,et al. Some Properties of Disjoint Sums of Tensors Related to Matrix Multiplication , 1982, SIAM J. Comput..
[49] Jack Dongarra,et al. LAPACK's user's guide , 1992 .
[50] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[51] Noga Alon,et al. An elementary construction of constant-degree expanders , 2007, SODA '07.
[52] Leslie G. Valiant,et al. Size Bounds for Superconcentrators , 1983, Theor. Comput. Sci..
[53] Barton P. Miller,et al. Critical path analysis for the execution of parallel and distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.
[54] L. R. Kerr,et al. On Minimizing the Number of Multiplications Necessary for Matrix Multiplication , 1969 .
[55] J. Demmel,et al. Sequential Communication Bounds for Fast Linear Algebra , 2012 .
[56] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[57] James Demmel,et al. Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem , 2010, SPAA '10.
[58] S. Winograd,et al. On the asymptotic complexity of matrix multiplication , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).
[59] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[60] Jeremy D. Frens,et al. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism , 2003, PPoPP '03.
[61] James Demmel,et al. Minimizing Communication in Linear Algebra , 2009, ArXiv.
[62] Dario Bini. Relations between exact and approximate bilinear algorithms. Applications , 1980 .
[63] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[64] Don Coppersmith,et al. Rectangular Matrix Multiplication Revisited , 1997, J. Complex..
[65] Arnold Schönhage,et al. Partial and Total Matrix Multiplication , 1981, SIAM J. Comput..
[66] V. Strassen. Gaussian elimination is not optimal , 1969 .
[67] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[68] Virginia Vassilevska Williams,et al. Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.
[69] V. Strassen. Relative bilinear complexity and matrix multiplication. , 1987 .
[70] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[71] David H. Bailey,et al. Extra high speed matrix multiplication on the Cray-2 , 1988 .
[72] Michael A. Bender,et al. Optimal sparse matrix dense vector multiplication in the I/O-model , 2007, SPAA.
[73] V. Rich. Personal communication , 1989, Nature.
[74] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[75] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[76] James Demmel,et al. Fast matrix multiplication is stable , 2006, Numerische Mathematik.
[77] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[78] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[79] Guy E. Blelloch,et al. Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.
[80] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[81] Ran Raz,et al. On the complexity of matrix product , 2002, STOC '02.
[82] A. Wigderson,et al. ENTROPY WAVES, THE ZIG-ZAG GRAPH PRODUCT, AND NEW CONSTANT-DEGREE , 2004, math/0406038.