software (SANS) effort

The challenge for the development of next-generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Selfadapting numerical software (SANS) systems are intended to meet this significant challenge. The process of arriving at an efficient numerical solution of problems in computational science involves numerous decisions by a numerical expert. Attempts to automate such decisions distinguish three levels: algorithmic decision, management of the parallel environment, and processor-specific tuning of kernels. Additionally, at any of these levels we can decide to rearrange the user’s data. In this paper we look at a number of efforts at the University of Tennessee to investigate these areas.

[1]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[2]  Charles L. Lawson,et al.  Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1] , 1979, TOMS.

[3]  Utpal Banerjee,et al.  A theory of loop permutations , 1990 .

[4]  George Bosilca,et al.  Recovery Patterns for Iterative Methods in a Parallel Unstable Environment , 2007, SIAM J. Sci. Comput..

[5]  Robert A. van de Geijn,et al.  Building a high-performance collective communication library , 1994, Proceedings of Supercomputing '94.

[6]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[7]  Ken Kennedy,et al.  Automatic blocking of QR and LU factorizations for locality , 2004, MSP '04.

[8]  Mario Lauria,et al.  Efficient implementation of reduce-scatter in MPI , 2003, J. Syst. Archit..

[9]  Jan Karel Lenstra,et al.  Approximation algorithms for scheduling unrelated parallel machines , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10]  Vipin Kumar,et al.  Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning (Distinguished Paper) , 2000, Euro-Par.

[11]  Jack J. Dongarra,et al.  Performance Analysis of MPI Collective Operations , 2005, IPDPS.

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[14]  Jason Duell,et al.  An evaluation of current high-performance networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[15]  Markus Schordan,et al.  Classification and Utilization of Abstractions for Optimization , 2004, ISoLA.

[16]  Sathish S. Vadhiyar,et al.  Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[17]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[18]  Jesper Larsson Träff,et al.  More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems , 2004, PVM/MPI.

[19]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[20]  David B. Shmoys,et al.  A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach , 1988, SIAM J. Comput..

[21]  Jack J. Dongarra,et al.  Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing , 1997, J. Parallel Distributed Comput..

[22]  Peter Sanders,et al.  A bandwidth latency tradeoff for broadcast and reduction , 2003, Inf. Process. Lett..

[23]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[24]  William Gropp,et al.  Beowulf Cluster Computing with Linux , 2003 .

[25]  Jaeyoung Choi,et al.  Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[26]  Victor Eijkhout,et al.  Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.

[27]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[28]  Qing Yi,et al.  Applying Loop Optimizations to Object-Oriented Abstractions Through General Classification of Array Semantics , 2004, LCPC.

[29]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[30]  Jack Dongarra,et al.  A Fault-Tolerant Communication Library for Grid Environments , 2003 .

[31]  Jordan Gergov,et al.  Approximation Algorithms for Dynamic Storage Allocation , 1996 .

[32]  Viggo Kann,et al.  Strong Lower Bounds on the Approximability of some NPO PB-Complete Maximization Problems , 1995, MFCS.

[33]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[34]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[35]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[36]  R. Rabenseifner,et al.  Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 , 2004 .

[37]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[38]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[39]  Christian Engelmann,et al.  Development of Naturally Fault Tolerant Algorithms for Computing on 100,000 Processors , 2002 .

[40]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[41]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[42]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[43]  Sathish S. Vadhiyar,et al.  Towards an Accurate Model for Collective Communications , 2004, Int. J. High Perform. Comput. Appl..

[44]  Jack J. Dongarra,et al.  HARNESS and fault tolerant MPI , 2001, Parallel Comput..

[45]  Roger W. Hockney,et al.  The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.