Evaluation of a Multithreaded Architecture for Defense Applications.

Abstract : Multithreading has received considerable attention in recent years as a promising way to hide memory latency in high performance computers, while providing access to a large and uniform shared memory. Tera Computer of Seattle has designed and built a state of the art multithreaded computer called the MTA. Its intended benefits are high processor utilization, scalable performance on applications that are difficult to parallelize, and reduced programming effort. The largest MTA and the only one outside of Seattle is at the San Diego Supercomputer Center (SDSC) on the campus of the University of California, San Diego (UCSD). Currently the MTA at SDSC has 8 processors. The performance and usability of the MTA for 14 defense relevant applications were evaluated in a two year project described here. The applications included seven standard kernels, five mini-applications, and two large applications. The evaluation was led by researchers at UCSD with collaborators at Caltech, Tera, Boeing, and Sanders/Lockheed Martin. UCSD researchers also carried out multithreaded scheduler and compiler studies. The principal findings of the project follow in the enclosed final report.

[1]  V. Kumar,et al.  Parallel Threshold-based ILU Factorization , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[2]  Michael B. Bieterman,et al.  TranAir: A full-potential, solution-adaptive, rectangular grid code for predicting subsonic, transonic, and supersonic flows about arbitrary configurations. Theory document , 1992 .

[3]  V. Rokhlin Diagonal Forms of Translation Operators for the Helmholtz Equation in Three Dimensions , 1993 .

[4]  R. Coifman,et al.  The fast multipole method for the wave equation: a pedestrian prescription , 1993, IEEE Antennas and Propagation Magazine.

[5]  Mark Short,et al.  On the nonlinear stability and detonability limit of a detonation wave for a model three-step chain-branching reaction , 1997, Journal of Fluid Mechanics.

[6]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[7]  Larry Carter,et al.  NAS Benchmarks on the Tera MTA , 1998 .

[8]  Sharon Brunett,et al.  An Initial Evaluation of the Tera Multithreaded Architecture and Programming System Using the C3I Parallel Benchmark Suite , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[9]  S. Teng,et al.  A Cartesian Parallel Nested Dissection Algorithm , 1994 .

[10]  Allan Snavely Explorations in Symbiosis on two Multithreaded Architectures , 1999 .

[11]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[12]  Leonid Oliker,et al.  Parallelization of a dynamic unstructured application on the tera MTA , 1999 .

[13]  Trevor Mudge,et al.  Monte Carlo Photon Transport On Shared Memory and Distributed Memory Parallel Processors , 1987 .

[14]  Wing Au,et al.  The C31 parallel benchmark suite - introduction and preliminary results , 1996, Supercomputing '96.

[15]  Jim Beveridge,et al.  Multithreading Applications in Win32: The Complete Guide to Threads , 1996 .

[16]  Larry Carter,et al.  Performance and Programming Experience on the Tera MTA , 1999, PPSC.

[17]  K. Mani Chandy,et al.  A system for structured high-performance multithreaded programming in Windows NT , 1998 .

[18]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[19]  D. P. Young,et al.  A locally refined rectangular grid finite element method: application to computational fluid dynamics and computational physics , 1990 .

[20]  Preston Briggs Automatic parallelization , 1996, SIGP.

[21]  W. R. Martin,et al.  Experiences with different parallel programming paradigms for Monte Carlo particle transport leads to a portable toolkit for parallel Monte Carlo , 1993 .

[22]  Larry Carter,et al.  Multi-processor Performance on the Tera MTA , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[23]  Joseph W. H. Liu,et al.  Computational models and task scheduling for parallel sparse Cholesky factorization , 1986, Parallel Comput..

[24]  D. Tullsen,et al.  ILP versus TLP on SMT , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[25]  William R. Martin,et al.  Monte Carlo Photon Transport on a Vector Supercomputer , 1986, IBM J. Res. Dev..

[26]  K. J. Fisher,et al.  Vectorized Monte Carlo photon transport , 1983, Parallel Comput..

[27]  Simon Kahan,et al.  A Scalable Approach for Solving Irregular Sparse Linear Systems on the Tera MTA Multithreaded Parallel Shared-Memory Computer , 1999, PPSC.