Chip multiprocessing and the cell broadband engine

Chip multiprocessing has become an exciting new direction for system designers to deliver increased performance by exploiting CMOS scaling. We discuss key design decisions facing the system architect of a chip multiprocessor and describe how these choices were made in the design of the Cell Broadband Engine.An important decision is whether to base system performance on thread-level parallelism alone, or to complement thread-level parallelism with other forms of parallelism. Depending on workload characteristics, providing parallelism at the processor core level may increase overall system efficiency.Parallelism is also a key to utilize available memory bandwidth more efficiently, by overlapping and interleaving multiple accesses to system memory. By interleaving the access streams of multiple threads, memory level parallelism can be increased to allow better memory interface utilization. In addition, compute-transfer parallelism (CTP) offers a new form of parallelism to initiate memory transfers under software control without stalling the requesting thread.We describe how the Cell Broadband Enginetmuses parallelism at all levels of the system abstraction to deliver a quantum leap in application performance, and how the Cell Synergistic Memory Flow engine exploits compute-transfer level parallelism by providing efficient block transfer capabilities.

[1]  José E. Moreira,et al.  Evaluation of a multithreaded architecture for cellular computing , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[2]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[3]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Pradip Bose,et al.  Optimizing pipelines for power and performance , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[5]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[6]  Ajay K. Royyuru,et al.  Blue Gene: A vision for protein science using a petaflop supercomputer , 2001, IBM Syst. J..

[7]  Brian Fahs,et al.  Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[8]  Andrew F. Glew MLP yes! ILP no , 1998, ASPLOS 1998.

[9]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[10]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[11]  Michael Gschwind,et al.  Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[12]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[13]  Scott Clark,et al.  Cell broadband engine interconnect and memory interface , 2005, 2005 IEEE Hot Chips XVII Symposium (HCS).

[14]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[15]  Martin Hopkins,et al.  A novel SIMD architecture for the cell heterogeneous chip-multiprocessor , 2005, 2005 IEEE Hot Chips XVII Symposium (HCS).

[16]  B. Flachs,et al.  The microarchitecture of the synergistic processor for a cell processor , 2006, IEEE Journal of Solid-State Circuits.

[17]  Michael Gschwind,et al.  Power and performance optimization at the system level , 2005, CF '05.

[18]  Tejas Karkhanis,et al.  A Day in the Life of a Data Cache Miss , 2002 .