software (SANS) effort
暂无分享,去创建一个
Victor Eijkhout | Erika Fuentes | Jack Dongarra | Zizhong Chen | Julien Langou | Sathish S. Vadhiyar | George Bosilca | Jelena Pjesivac-Grbovic | Graham E. Fagg | Keith Seymour | Piotr Luszczek | V. Eijkhout | G. Fagg | J. Dongarra | P. Luszczek | G. Bosilca | J. Langou | Jelena Pjesivac-Grbovic | Zizhong Chen | E. Fuentes | Keith Seymour
[1] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[2] Charles L. Lawson,et al. Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1] , 1979, TOMS.
[3] Utpal Banerjee,et al. A theory of loop permutations , 1990 .
[4] George Bosilca,et al. Recovery Patterns for Iterative Methods in a Parallel Unstable Environment , 2007, SIAM J. Sci. Comput..
[5] Robert A. van de Geijn,et al. Building a high-performance collective communication library , 1994, Proceedings of Supercomputing '94.
[6] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[7] Ken Kennedy,et al. Automatic blocking of QR and LU factorizations for locality , 2004, MSP '04.
[8] Mario Lauria,et al. Efficient implementation of reduce-scatter in MPI , 2003, J. Syst. Archit..
[9] Jan Karel Lenstra,et al. Approximation algorithms for scheduling unrelated parallel machines , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[10] Vipin Kumar,et al. Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (Distinguished Paper) , 2000, Euro-Par.
[11] Jack J. Dongarra,et al. Performance Analysis of MPI Collective Operations , 2005, IPDPS.
[12] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[13] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..
[14] Jason Duell,et al. An evaluation of current high-performance networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[15] Markus Schordan,et al. Classification and Utilization of Abstractions for Optimization , 2004, ISoLA.
[16] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[17] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[18] Jesper Larsson Träff,et al. More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems , 2004, PVM/MPI.
[19] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[20] David B. Shmoys,et al. A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach , 1988, SIAM J. Comput..
[21] Jack J. Dongarra,et al. Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing , 1997, J. Parallel Distributed Comput..
[22] Peter Sanders,et al. A bandwidth latency tradeoff for broadcast and reduction , 2003, Inf. Process. Lett..
[23] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[24] William Gropp,et al. Beowulf Cluster Computing with Linux , 2003 .
[25] Jaeyoung Choi,et al. Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..
[26] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[27] Jack Dongarra,et al. Automatic Blocking of Nested Loops , 1990 .
[28] Qing Yi,et al. Applying Loop Optimizations to Object-Oriented Abstractions Through General Classification of Array Semantics , 2004, LCPC.
[29] Kees Verstoep,et al. Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.
[30] Jack Dongarra,et al. A Fault-Tolerant Communication Library for Grid Environments , 2003 .
[31] Jordan Gergov,et al. Approximation Algorithms for Dynamic Storage Allocation , 1996 .
[32] Viggo Kann,et al. Strong Lower Bounds on the Approximability of some NPO PB-Complete Maximization Problems , 1995, MFCS.
[33] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[34] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[35] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[36] R. Rabenseifner,et al. Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 , 2004 .
[37] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[38] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[39] Christian Engelmann,et al. Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .
[40] Edoardo Amaldi,et al. On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..
[41] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[42] Richard P. Martin,et al. Assessing Fast Network Interfaces , 1996, IEEE Micro.
[43] Sathish S. Vadhiyar,et al. Towards an Accurate Model for Collective Communications , 2004, Int. J. High Perform. Comput. Appl..
[44] Jack J. Dongarra,et al. HARNESS and fault tolerant MPI , 2001, Parallel Comput..
[45] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.