Architectural implications of a family of irregular applications

Irregular applications based on sparse matrices are at the core of many important scientific computations. Since the importance of such applications is likely to increase in the future, high-performance parallel and distributed systems must provide adequate support for such applications. We characterize a family of irregular scientific applications and derive the demands they will place on the communication systems of future parallel systems. Running time of these applications is dominated by repeated sparse matrix vector product (SMVP) operations. Using simple performance models of the SMVP, we investigate requirements for bisection bandwidth, sustained bandwidth on each processing element (PE), burst bandwidth during block transfers, and block latencies for PEs under different assumptions about sustained computational throughput. Our model indicates that block latencies are likely to be the most problematic engineering challenge for future communication networks.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[3]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[4]  Gary L. Miller,et al.  Automatic Mesh Partitioning , 1992 .

[5]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[6]  Corporate The MPI Forum,et al.  MPI: a message passing interface , 1993, Supercomputing '93.

[7]  P. Messina,et al.  Architectural requirements of parallel scientific applications with explicit communication , 1993, ISCA '93.

[8]  A. George,et al.  Graph theory and sparse matrix computation , 1993 .

[9]  Horst D. Simon,et al.  Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems , 1994, Concurr. Pract. Exp..

[10]  Forum Mpi MPI: A Message-Passing Interface , 1994 .

[11]  Thomas R. Gross,et al.  Optimizing memory system performance for communication in parallel computers , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[12]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[13]  Alan L. Cox,et al.  An integrated compile-time/run-time software distributed shared memory system , 1996, ASPLOS VII.

[14]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[15]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[16]  Synchronization and Communication in the T3E Multiprocessor , 1996, ASPLOS.

[17]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[18]  David R. O'Hallaron,et al.  Earthquake ground motion modeling on parallel computers , 1996, Supercomputing '96.

[19]  John L. Hennessy,et al.  Application and Architectural Bottlenecks in Large Scale Distributed Shared Memory Machines , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[20]  J. Shewchuk,et al.  Delaunay refinement mesh generation , 1997 .

[21]  David R. O'Hallaron Spark98: Sparse Matrix Kernels for Shared Memory and Message Passing Systems , 1997 .

[22]  David R. O'Hallaron,et al.  Large-scale simulation of elastic wave propagation in heterogeneous media on parallel computers , 1998 .

[23]  Gary L. Miller,et al.  Geometric Mesh Partitioning: Implementation and Experiments , 1998, SIAM J. Sci. Comput..