CP-PACS: a massively parallel processor for large scale scientific calculations

CP-PACS (Computational Physics by Parallel Array Computer System) is a massively parallel processor with 2048 processing units built at Center for Computational Physics, University of Tsukuba. It has an MIMD architecture with distributed memory system. The node processor of CPPACS is a RISC microprocessor enhanced by Pseudo Vector Processing feature, which can realize high-performance vector processing. The interconnection network is 3-dimensional Hyper-Crossbar Network, which has high exibility and embeddability for various network topologies and communication patterns. The theoretical peak performance of whole system is 614.4 GFLOPS. In this paper, we describe the overview of CP-PACS architecture and several special architectural characteristics of it. Then, several performance evaluations both for single node processor and for parallel system are described based on LINPACK and Kernel CG of NAS Parallel Benchmarks. Through these evaluations, the e ectiveness of Pseudo Vector Processing and Hyper-Crossbar Network is shown.

[1]  Hiroshi Nakamura,et al.  Pseudo vector processor based on register-windowed superscalar pipeline , 1992, Proceedings Supercomputing '92.

[2]  Takeshi Hoshino,et al.  QCDPAX-an MIMD array of vector processors for the numerical simulation of quantum chromodynamics , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[3]  Jack J. Dongarra,et al.  Performance of various computers using standard linear equations software in a FORTRAN environment , 1988, CARN.

[4]  Hiroshi Nakamura,et al.  Evaluation of pseudo vector processor based on slide-windowed registers , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[5]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[6]  Hiroshi Nakamura,et al.  A scalar architecture for pseudo vector processing based on slide-windowed registers , 1993, ICS '93.

[7]  J. Dongarra Performance of various computers using standard linear equations software , 1990, CARN.

[8]  Allan Porterfield,et al.  Data cache performance of supercomputer applications , 1990, Proceedings SUPERCOMPUTING '90.

[9]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[10]  Harvey J. Wasserman,et al.  Performance evaluation of the IBM RISC system/6000: comparison of an optimized scalar processor with two vector processors , 1990, Proceedings SUPERCOMPUTING '90.

[11]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).