Programming many‐core architectures ‐ a case study: dense matrix computations on the Intel single‐chip cloud computer processor
暂无分享,去创建一个
Robert A. van de Geijn | Timothy G. Mattson | Rob F. Van der Wijngaart | Bryan Marker | Jack Poulson | Ernie Chan | Theodore E. Kubaska
[1] Sergei Gorlatch,et al. Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.
[2] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[3] Tom Shanley,et al. Pentium Processor System Architecture , 1993 .
[4] Jesper Larsson Träff,et al. The Hierarchical Factor Algorithm for All-to-All Communication (Research Note) , 2002, Euro-Par.
[5] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[6] Robert Schreiber,et al. Scalability of Sparse Direct Solvers , 1993 .
[7] Robert A. van de Geijn,et al. Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.
[8] Saurabh Dighe,et al. The 48-core SCC Processor: the Programmer's View , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .
[10] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.
[11] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.
[12] Robert A. van de Geijn,et al. Two Dimensional Basic Linear Algebra Communication Subprograms , 1993, PPSC.
[13] Jan Mayer,et al. A numerical evaluation of preprocessing and ILU-type preconditioners for the solution of unsymmetric sparse linear systems using iterative methods , 2009, TOMS.
[14] Robert A. van de Geijn,et al. Representing linear algebra algorithms in code: the FLAME application program interfaces , 2005, TOMS.
[15] G. W. Stewart. Communication and matrix computations on large message passing systems , 1990, Parallel Comput..
[16] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[17] Francisco D. Igual,et al. Solving Linear Algebra Problems on Distributed-Memory Computers using Serial Codes , 2010 .
[18] Timothy G. Mattson,et al. Light-weight communications on Intel's single-chip cloud computer processor , 2011, OPSR.
[19] Miodrag Potkonjak,et al. Optimizing power using transformations , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[20] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[21] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[22] S Quintana-OrtíEnrique,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009 .
[23] Bruce Hendrickson,et al. The Torus-Wrap Mapping for Dense Matrix Calculations on Massively Parallel Computers , 1994, SIAM J. Sci. Comput..
[24] Robert A. van de Geijn,et al. Using PLAPACK - parallel linear algebra package , 1997 .
[25] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[26] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).