The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor

As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance, and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level, thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system, this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE.

[1]  Michael Gschwind Chip multiprocessing and the cell broadband engine , 2006, CF '06.

[2]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[3]  Robert H. Dennard,et al.  Design of ion-implanted MOSFET's with very small physical dimensions , 2007 .

[4]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[5]  Rohit Bhatia,et al.  Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.

[6]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[8]  Tejas Karkhanis,et al.  A Day in the Life of a Data Cache Miss , 2002 .

[9]  Ajay K. Royyuru,et al.  Blue Gene: A vision for protein science using a petaflop supercomputer , 2001, IBM Syst. J..

[10]  Andrew F. Glew MLP yes! ILP no , 1998, ASPLOS 1998.

[11]  José E. Moreira,et al.  Evaluation of a multithreaded architecture for cellular computing , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[12]  Alan Gara,et al.  Exploiting Workload Parallelism for Performance and Power Optimization in Blue Gene , 2006, IEEE Micro.

[13]  Michael Gschwind,et al.  Power and performance optimization at the system level , 2005, CF '05.

[14]  Brian Fahs,et al.  Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[16]  Clayton M. Christensen The Innovator's Dilemma , 1997 .

[17]  Scott Clark,et al.  Cell broadband engine interconnect and memory interface , 2005, 2005 IEEE Hot Chips XVII Symposium (HCS).

[18]  Pradip Bose,et al.  Optimizing pipelines for power and performance , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[19]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[20]  Michael Gschwind,et al.  Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[21]  Martin Hopkins,et al.  A novel SIMD architecture for the cell heterogeneous chip-multiprocessor , 2005, 2005 IEEE Hot Chips XVII Symposium (HCS).

[22]  B. Flachs,et al.  The microarchitecture of the synergistic processor for a cell processor , 2006, IEEE Journal of Solid-State Circuits.