Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations
暂无分享,去创建一个
James Demmel | Nicholas Knight | Edgar Solomonik | Erin Carson | J. Demmel | Edgar Solomonik | E. Carson | Nicholas Knight
[1] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[2] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[3] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[4] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[5] V. Strassen. Gaussian elimination is not optimal , 1969 .
[6] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[7] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[8] B GibbonsPhillip. ACM transactions on parallel computing , 2014 .
[9] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[10] James Demmel,et al. Minimizing Communication in Linear Algebra , 2009, ArXiv.
[11] James Demmel,et al. Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1 , 2013, ArXiv.
[12] James Demmel,et al. Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[13] Danny C. Sorensen,et al. Analysis of Pairwise Pivoting in Gaussian Elimination , 1985, IEEE Transactions on Computers.
[14] Christos H. Papadimitriou,et al. A Communication-Time Tradeoff , 1987, SIAM J. Comput..
[15] Optimal Schedules for d-D Grid Graphs with Communication Delays (Extended Abstract) , 1996, STACS.
[16] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[17] L. R. Ford,et al. NETWORK FLOW THEORY , 1956 .
[18] Michael T. Heath,et al. Parallel solution of triangular systems on distributed-memory multiprocessors , 1988 .
[19] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[20] Dror Irony,et al. Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers , 2002, Parallel Process. Lett..
[21] Michael A. Bender,et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model , 2007, SPAA '07.
[22] Sivan Toledo,et al. Efficient out-of-core algorithms for linear relaxation using blocking covers , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.
[23] Richard Bellman,et al. ON A ROUTING PROBLEM , 1958 .
[24] Alexander Tiskin. Communication-efficient parallel generic pairwise elimination , 2007, Future Gener. Comput. Syst..
[25] James Demmel,et al. Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations , 2014, SPAA.
[26] Michele Scquizzato,et al. Communication Lower Bounds for Distributed-Memory Computations , 2013, STACS.
[27] Stephen Warshall,et al. A Theorem on Boolean Matrices , 1962, JACM.
[28] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[29] Alexander Tiskin,et al. The design and analysis of bulk-synchronous parallel algorithms , 1998 .
[30] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[31] F. P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.
[32] A. Tiskin. Bulk-Synchronous Parallel Gaussian Elimination , 2002 .
[33] James Demmel,et al. Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[34] Franco P. Preparata,et al. Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.
[35] Jack Dongarra,et al. ScaLAPACK user's guide , 1997 .
[36] James Demmel,et al. Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods , 2013, SIAM J. Sci. Comput..
[37] Alexander Tiskin,et al. All-Pairs Shortest Paths Computation in the BSP Model , 2001, ICALP.
[38] Evripidis Bampis,et al. Optimal Schedules for d-D Grid Graphs with Communication Delays , 1998, Parallel Comput..
[39] J. Demmel,et al. Tradeoffs between synchronization , communication , and work in parallel linear algebra computations , 2014 .
[40] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.
[41] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[42] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[43] Gianfranco Bilardi,et al. A Lower Bound Technique for Communication on BSP with Application to the FFT , 2012, Euro-Par.
[44] J. Demmel,et al. Avoiding Communication in Computing Krylov Subspaces , 2007 .