Network Topologies and Inevitable Contention
暂无分享,去创建一个
James Demmel | Sivan Toledo | Oded Schwartz | Grey Ballard | Benjamin Lipshitz | Andrew Gearhart | Yishai Oltchik | J. Demmel | Sivan Toledo | Grey Ballard | O. Schwartz | A. Gearhart | Yishai Oltchik | Benjamin Lipshitz
[1] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.
[2] G. Bilardi,et al. Deterministic on-line routing on area-universal networks , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.
[3] James Demmel,et al. Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1 , 2013, ArXiv.
[4] John E. Savage. Extending the Hong-Kung Model to Memory Hierarchies , 1995, COCOON.
[5] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[6] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[7] Béla Bollobás,et al. Edge-isoperimetric inequalities in the grid , 1991, Comb..
[8] V. Strassen. Relative bilinear complexity and matrix multiplication. , 1987 .
[9] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[10] P. Heidelberger,et al. The IBM Blue Gene/Q Interconnection Fabric , 2012, IEEE Micro.
[11] Charles E. Leiserson,et al. Randomized Routing on Fat-Trees , 1989, Adv. Comput. Res..
[12] John H. Lindsey,et al. Assignment of Numbers to Vertices , 1964 .
[13] William J. Dally,et al. Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.
[14] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .
[15] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[16] François Le Gall,et al. Powers of tensors and fast matrix multiplication , 2014, ISSAC.
[17] Katherine A. Yelick,et al. A Communication-Optimal N-Body Algorithm for Direct Interactions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[18] James Demmel,et al. Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication , 2012, MedAlg.
[19] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[20] James Demmel,et al. Exploiting Data Sparsity in Parallel Matrix Powers Computations , 2013, PPAM.
[21] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[22] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[23] Ibm Blue,et al. Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..
[24] N. Linial,et al. Expander Graphs and their Applications , 2006 .
[25] Gianfranco Bilardi,et al. A Lower Bound Technique for Communication on BSP with Application to the FFT , 2012, Euro-Par.
[26] James Demmel,et al. Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.
[27] Michele Scquizzato,et al. Communication Lower Bounds for Distributed-Memory Computations , 2013, STACS.
[28] Arnold Schönhage,et al. Partial and Total Matrix Multiplication , 1981, SIAM J. Comput..
[29] Grey Ballard,et al. Avoiding Communication in Dense Linear Algebra , 2013 .
[30] V. Strassen. Gaussian elimination is not optimal , 1969 .
[31] Oded Schwartz,et al. Matrix Multiplication I/O-Complexity by Path Routing , 2015, SPAA.
[32] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[33] Toshiyuki Shimizu,et al. Tofu: A 6D Mesh/Torus Interconnect for Exascale Computers , 2009, Computer.
[34] Lorenzo De Stefani,et al. The I/O Complexity of Strassen's Matrix Multiplication with Recomputation , 2016, WADS.
[35] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[36] Jarle Berntsen,et al. Communication efficient matrix multiplication on hypercubes , 1989, Parallel Comput..
[37] Franco P. Preparata,et al. Area-time lower-bound techniques with applications to sorting , 2005, Algorithmica.
[38] Charles E. Leiserson,et al. Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.
[39] Michael T. Goodrich,et al. Communication-Efficient Parallel Sorting , 1999, SIAM J. Comput..