论文信息 - Limitations of the PlayStation 3 for High Performance Cluster Computing

Limitations of the PlayStation 3 for High Performance Cluster Computing

Power consumption, heat dissipation and other physical limitations are pushing the microprocessor industry towards multicore design patterns. Most of the processor manufacturers, such as Intel and AMD, are following more conventional approaches, which consist of homogeneous, symmetric multicores where execution units are replicated on the same dime; multiple execution units share some cache level (generally L2 and L3) and the bus to memory. Other manufacturers proposed still homogeneous approaches but with a stronger emphasis on parallelism and hyperthreading. This is, for example, the case of Sun with the UltraSPARC T1 (known as “Niagara”). The UltraSPARC T1 [25,24] can have up to eight homogeneous cores each of which is four-way hyperthreaded which delivers a maximum parallelism degree of thirty-two. The Niagara processor is mostly developed for web servers and database applications since it provides high computational power for integer operations, which are used considerably in pointer arithmetics and string processing. Yet other chip manufacturers started exploring heterogeneous designs where cores have different architectural features. One such example is the Cell Broadband Engine [22,17,19,18] developed by STI, a consortium formed by Sony, Toshiba and IBM. The Cell BE has outstanding floating-point computational power, which makes it a considerable candidate for high performance computing systems. IBM shipped the first Cell-based system, the BladeCenter QS20, on September 12th 2006. This blade is equipped with two Cell processors with a 512 MB memory each and connected in a NUMA configuration; the external connectivity is achieved through a Gigabit and an Infiniband network interface. The BladeCenter QS20 has impressive computational power that, coupled with its high speed network interfaces, makes it a good candidate for high performance cluster computing. At almost the same period (November 11th), Sony released the PlayStation 3 (PS3) gaming console. Even if this console is not meant for high performance computing, it is still equipped with a (stripped down) Cell processor and its price ( $600) definitely makes it an attractive solution for building a Cell-based cluster. This document aims at evaluating the performance and the limitations of the PS3 platform for high performance cluster computing.

[1] Ramesh C. Agarwal,et al. A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..

[2] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .

[3] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[4] William Gropp,et al. Users guide for mpich, a portable implementation of MPI , 1996 .

[5] Fabrizio Petrini,et al. Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[6] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7] Jack J. Dongarra,et al. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008, IEEE Transactions on Parallel and Distributed Systems.

[8] Robert Strzodka,et al. Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[9] Jaeyoung Choi,et al. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..

[10] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[11] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[12] S. Asano,et al. The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[13] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[14] William Gropp,et al. MPICH2: A New Start for MPI Implementations , 2002, PVM/MPI.

[15] H. Peter Hofstee,et al. Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[16] Jack Dongarra,et al. SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3 , 2007 .

[17] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .