Parallel Numerical Computing from Illiac IV to Exascale - The Contributions of Ahmed H. Sameh

As exascale computing is looming on the horizon while multicore and GPU’s are routinely used, we survey the achievements of Ahmed H. Sameh, a pioneer in parallel matrix algorithms. Studying his contributions since the days of Illiac IV as well as the work that he directed and inspired in the building of the Cedar multiprocessor and his recent research unfolds a useful historical perspective in the field of parallel scientific computing.

[1]  Aart J. C. Bik,et al.  Compilation techniques for sparse matrix computations , 1993, ICS '93.

[2]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[3]  Dennis Gannon,et al.  Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[4]  A. Sameh On Jacobi and Jacobi-I ike Algorithms for a Parallel Computer , 2010 .

[5]  Ahmed H. Sameh,et al.  Trace Minimization Algorithm for the Generalized Eigenvalue Problem , 1982, PPSC.

[6]  Eric Polizzi,et al.  A threaded SPIKE algorithm for solving general banded systems , 2011, Parallel Comput..

[7]  Pen-Chung Yew,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..

[8]  Murat Manguoglu,et al.  Analysis of the Truncated SPIKE Algorithm , 2008, SIAM J. Matrix Anal. Appl..

[9]  Jack J. Dongarra,et al.  On some parallel banded system solvers , 1984, Parallel Comput..

[10]  Graeme Fairweather,et al.  The method of fundamental solutions for scattering and radiation problems , 2003 .

[11]  R. Brent,et al.  Solving Triangular Systems on a Parallel Computer , 1977 .

[12]  A. Sameh,et al.  A tearing-based hybrid parallel banded linear system solver , 2009 .

[13]  Ananth Grama,et al.  Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques , 2012, Parallel Comput..

[14]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[15]  Douglas Stott Parker,et al.  Analysis of Rounding Methods in Floating-Point Arithmetic , 1977, IEEE Transactions on Computers.

[16]  Daniel L. Slotnick,et al.  The SOLOMON computer , 1962, AFIPS '62 (Fall).

[17]  Chandrika Kamath,et al.  A projection method for solving nonsymmetric linear systems on multiprocessors , 1989, Parallel Comput..

[18]  David J. Kuck,et al.  Parallel Poisson and Biharmonic solvers , 1976, Computing.

[19]  A. Sameh,et al.  The trace minimization method for the symmetric generalized eigenvalue problem , 2000 .

[20]  Douglas Stott Parker,et al.  ROM-rounding: A new rounding scheme , 1975, 1975 IEEE 3rd Symposium on Computer Arithmetic (ARITH).

[21]  Duncan H. Lawrie,et al.  The computation and communication complexity of a parallel banded system solver , 1984, TOMS.

[22]  Jay Hoeflinger,et al.  Cedar Fortran and other vector and parallel Fortran dialects , 1988, Proceedings. SUPERCOMPUTING '88.

[23]  Richard M. Brown,et al.  The ILLIAC IV Computer , 1968, IEEE Transactions on Computers.

[24]  Murat Manguoglu,et al.  PSPIKE: A Parallel Hybrid Sparse Linear System Solver , 2009, Euro-Par.

[25]  K. A. Gallivan,et al.  Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[26]  D J Kuck,et al.  Parallel Supercomputing Today and the Cedar Approach , 1986, Science.

[27]  J. H. Ericksen,et al.  Implementation of a Convective Problem Requiring Auxiliary Storage , 1976, TOMS.

[28]  C. Jacobi,et al.  C. G. J. Jacobi's Gesammelte Werke: Über ein leichtes Verfahren, die in der Theorie der Sacularstorungen vorkommenden Gleichungen numerisch aufzulosen , 1846 .

[29]  David A. Padua,et al.  FALCON: A MATLAB Interactive Restructuring Compiler , 1995, LCPC.

[30]  V. Kumar,et al.  Parallel Matrix-Vector Product Using Approximate Hierarchical Methods , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[31]  Duncan H. Lawrie,et al.  Supercomputing tradeoffs and the cedar system , 1988 .

[32]  Danny C. Sorensen,et al.  Analysis of Pairwise Pivoting in Gaussian Elimination , 1985, IEEE Transactions on Computers.

[33]  Ahmed Sameh,et al.  A discrete-variable approach for elastic-plastic wave motions in layered solids , 1971 .

[34]  Dario Bini,et al.  Parallel Solution of Certain Toeplitz Linear Systems , 1984, SIAM J. Comput..

[35]  Yousef Saad,et al.  Performance Study of Some Supercomputers Using a Sparse Matrix Benchmark , 1989, PPSC.

[36]  Ahmed H. Sameh,et al.  A parallel hybrid banded system solver: the SPIKE algorithm , 2006, Parallel Comput..

[37]  Jack J. Dongarra,et al.  A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems , 1999, Scalable Comput. Pract. Exp..

[38]  Murat Manguoglu,et al.  Weighted Matrix Ordering and Parallel Banded Preconditioners for Iterative Linear System Solvers , 2010, SIAM J. Sci. Comput..

[39]  Ahmed Sameh,et al.  NUMERICAL CALCULATION AND COMPUTER DESIGN , 1978 .

[40]  B. Philippe,et al.  Parallel Algorithms for the Singular Value Decomposition , 2005 .

[41]  Robert B. Wilhelmson Solving partial differential equations using ILLIAC IV , 1974 .

[42]  Ahmed Sameh,et al.  On Certain Parallel Toeplitz Linear System Solvers , 1981 .

[43]  Ahmed Sameh,et al.  Parallel algorithms for network routing problems and recurrences , 1982 .

[44]  Allen D. Malony,et al.  Performance Prediction for Parallel Numerical Algorithms , 1991, Int. J. High Speed Comput..

[45]  D. Heller A Survey of Parallel Algorithms in Numerical Linear Algebra. , 1978 .

[46]  Aart J. C. Bik,et al.  Advanced Compiler Optimizations for Sparse Computations , 1995, J. Parallel Distributed Comput..

[47]  Efstratios Gallopoulos,et al.  CSE: content and product , 1997 .

[48]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM J. Matrix Anal. Appl..

[49]  David J. Kuck,et al.  On Stable Parallel Linear System Solvers , 1978, JACM.

[50]  Ahmed Sameh,et al.  Hybrid Parallel Linear System Solvers , 1999 .

[51]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[52]  Peter D. Welch,et al.  The fast Fourier transform algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms☆ , 1970 .

[53]  Ahmed Sameh,et al.  The Illiac IV system , 1972 .

[54]  Yousef Saad,et al.  Solving Elliptic Difference Equations on a Linear Array of Processors , 1985 .

[55]  Harry A. G. Wijshoff,et al.  The Utilization of Matrix Structure to Generate Optimized Code from MATLAB Programs , 2004, International Journal of Parallel Programming.

[56]  A. V. Duin,et al.  ReaxFF: A Reactive Force Field for Hydrocarbons , 2001 .

[57]  J. Demmel,et al.  On the correctness of some bisection-like parallel eigenvalue algorithms in floating point arithmetic. , 1995 .

[58]  Allen D. Malony,et al.  Run-time monitoring of concurrent programs on the Cedar multiprocessor , 1990, Proceedings SUPERCOMPUTING '90.

[59]  Ahmed H. Sameh,et al.  Large scale simulation of particulate flows , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[60]  Murat Manguoglu,et al.  Performance models for the Spike banded linear system solver , 2011 .

[61]  Gene H. Golub,et al.  A parallel balance scheme for banded linear systems , 2001, Numer. Linear Algebra Appl..

[62]  A. Veidenbaum,et al.  The cedar system and an initial performance study , 1993, ISCA '93.

[63]  William Jalby,et al.  Stability Analysis and Improvement of the Block Gram-Schmidt Algorithm , 1991, SIAM J. Sci. Comput..

[64]  David J. Kuck,et al.  Practical Parallel Band Triangular System Solvers , 1978, TOMS.

[65]  Bernard Philippe,et al.  The Davidson Method , 1994, SIAM J. Sci. Comput..

[66]  Olaf Schenk,et al.  Solving unsymmetric sparse systems of linear equations with PARDISO , 2004, Future Gener. Comput. Syst..

[67]  Yousef Saad,et al.  SPARK: a benchmark package for sparse computations , 1990, ICS '90.

[68]  Michael W. Berry,et al.  Multiprocessor Schemes for Solving Block Tridiagonal Linear Systems , 1988 .

[69]  Dario Andrea Bini,et al.  Fast parallel and sequential computations and spectral properties concerning band Toeplitz matrices , 1983 .

[70]  Jean-Michel Muller,et al.  Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[71]  D. Lee,et al.  Boundary integral domain decomposition of hierarchical memory multiprocessors , 1988, ICS '88.

[72]  Gene H. Golub,et al.  On direct methods for solving Poisson's equation , 1970, Milestones in Matrix Computation.

[73]  Roger W. Hockney,et al.  A Fast Direct Solution of Poisson's Equation Using Fourier Analysis , 1965, JACM.

[74]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[75]  Vipin Kumar,et al.  Parallel Hierarchical Solvers and Preconditioners for Boundary Element Methods , 1998, SIAM J. Sci. Comput..

[76]  Billy L. Buzbee A Fast Poisson Solver Amenable to Parallel Computation , 1973, IEEE Transactions on Computers.

[77]  Harold S. Stone,et al.  An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations , 1973, JACM.

[78]  Vipin Kumar,et al.  Performance and Scalability of Preconditioned Conjugate Gradient Methods on Parallel Computers , 1995, IEEE Trans. Parallel Distributed Syst..

[79]  Ahmed H. Sameh,et al.  Iterative methods for the solution of elliptic difference equations on multiprocessors , 1981, CONPAR.

[80]  Dario Andrea Bini,et al.  Matrix structures in parallel matrix computations , 1988 .

[81]  Luc Giraud,et al.  Parallel Distributed FFT-Based Solvers for 3-D Poisson Problems in Meso-Scale Atmospheric Simulations , 2001, Int. J. High Perform. Comput. Appl..

[82]  Vipin Kumar,et al.  Scalable parallel formulations of the barnes-hut method for n-body simulations , 1994, Supercomputing '94.

[83]  Matthew G. Knepley,et al.  Parallel Simulation of Particulate Flows , 1998, IRREGULAR.

[84]  H. Rutishauser Simultaneous iteration method for symmetric matrices , 1970 .

[85]  A. H. Sameh A FAST POISSON SOLVER FOR MULTIPROCESSORS , 1984 .

[86]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[87]  William Jalby,et al.  Impact of Hierarchical Memory Systems On Linear Algebra Algorithm Design , 1988 .

[88]  Efstratios Gallopoulos,et al.  Rapid Elliptic Solvers , 2011, Encyclopedia of Parallel Computing.

[89]  Ahmed Sameh,et al.  SPIKE: A parallel environment for solving banded linear systems , 2007 .

[90]  A. Sameh,et al.  An overview of parallel algorithms for the singular value and symmetric eigenvalue problems , 1989 .

[91]  R. Morgan,et al.  Generalizations of Davidson's method for computing eigenvalues of sparse symmetric matrices , 1986 .

[92]  Gene H. Golub,et al.  On Fourier-Toeplitz methods for separable elliptic problems , 1974 .

[93]  Pen-Chung Yew,et al.  Cedar architecture and its software , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[94]  Jean-Pierre Croisille,et al.  A Fast Direct Solver for the Biharmonic Problem in a Rectangular Grid , 2008, SIAM J. Sci. Comput..

[95]  E. Gallopoulos,et al.  Computer as thinker/doer: problem-solving environments for computational science , 1994, IEEE Computational Science and Engineering.

[96]  Perry A. Emrath Xylem: An Operating System for the Cedar Multiprocessor , 1985, IEEE Software.

[97]  Ahmed Hamdy Mohamed Sameh Numerical Analysis of Axi-Symmetric Wave Propagation in Elastic-Plastic Layered Media , 1968 .

[98]  Ahmed H. Sameh,et al.  Efficient Calculation of the Effects of Roundoff Errors , 1978, TOMS.

[99]  A. Sameh,et al.  A matrix decomposition method for orthotropic elasticity problems , 1989 .

[100]  Michael W. Berry Multiprocessor sparse SVD algorithms and applications , 1991 .

[101]  Ananth Grama,et al.  Improving Error Bounds for Multipole-Based Treecodes , 2000, SIAM J. Sci. Comput..

[102]  Ahmed Sameh,et al.  On the intermediate eigenvalues of symmetric sparse matrices , 1973 .

[103]  Ahmed H. Sameh,et al.  Algorithms for roundoff error analysis —A relative error approach , 1980, Computing.

[104]  A. Sameh,et al.  A parallel hybrid sparse linear system solver , 1990 .

[105]  William F. Moss,et al.  Decay rates for inverses of band matrices , 1984 .

[106]  A. Sameh,et al.  The behavior of conjugate gradient algorithms on a multivector processor with a hierarchical memory , 1988 .

[107]  David J. Kuck,et al.  Parallel Computation of Eigenvalues of Real Matrices , 1971, IFIP Congress.

[108]  Zahari Zlatev,et al.  Solving general sparse linear systems using conjugate gradient-type methods , 1990, ICS '90.

[109]  Ananth Grama,et al.  Analyzing the Error Bounds of Multipole-Based Treecodes , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[110]  Jack Dongarra,et al.  Implementation in ScaLAPACK of Divide-and-Conquer Algorithms forBanded and Tridiagonal Linear Systems , 1997 .

[111]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[112]  Gerard L. G. Sleijpen,et al.  A generalized Jacobi-Davidson iteration method for linear eigenvalue problems , 1998 .

[113]  Ahmed H. Sameh,et al.  A multiprocessor algorithm for the symmetric tridiagonal eigenvalue problem , 1985, PPSC.

[114]  Ahmed Sameh,et al.  On some parallel algorithms on a ring of processors , 1985 .

[115]  Ahmed H. Sameh,et al.  Row Projection Methods for Large Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[116]  Ananth Grama,et al.  Reactive Molecular Dynamics: Numerical Methods and Algorithmic Techniques , 2012, SIAM J. Sci. Comput..

[117]  R. Bramley,et al.  On some parallel preconditioned CG schemes , 1991 .