Multi- and Many-Cores, Architectural Overview for Programmers

Parallelism has been used since the early days of computing to enhance performance. From the first computers to the most modern sequential processors (also called uniprocessors), the main concepts introduced by von Neumann [20] are still in use. However, the ever-increasing demand for computing performance has pushed computer architects toward implementing different techniques of parallelism. The von Neumann architecture was initially a sequential machine operating on scalar data with bit-serial operations [20]. Word-parallel operations were made possible by using more complex logic that could perform binary operations in parallel on all the bits in a computer word, and it was just the start of an adventure of innovations in parallel computer architectures.

[1]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[2]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[3]  Cacm Staff,et al.  A conversation with David E. Shaw , 2009 .

[4]  Volodymyr Kindratenko,et al.  Implementation of scientific computing applications on the Cell Broadband Engine , 2009, HiPC 2009.

[5]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[6]  H. T. Kung Why systolic architectures? , 1982, Computer.

[7]  Surendra Byna,et al.  Taxonomy of Data Prefetching for Multicore Processors , 2009, Journal of Computer Science and Technology.

[8]  David F. Hendry,et al.  The computer as von Neumann planned it , 1993, IEEE Annals of the History of Computing.

[9]  H. Franke,et al.  Introduction to the wire-speed processor and architecture , 2010, IBM J. Res. Dev..

[10]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[12]  Richard M. Brown,et al.  The ILLIAC IV Computer , 1968, IEEE Transactions on Computers.

[13]  Per Stenström,et al.  A Cache-Partitioning Aware Replacement Policy for Chip Multiprocessors , 2006, HiPC.

[14]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[15]  Marius Grannæs,et al.  Reducing Memory Latency by Improving Resource Utilization , 2010 .

[16]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[17]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[18]  Ahmed Amine Jerraya,et al.  Multiprocessor System-on-Chip (MPSoC) Technology , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Michael J. Flynn,et al.  Very high-speed computing systems , 1966 .

[20]  Samuel H. Fuller,et al.  Computing Performance: Game Over or Next Level? , 2011, Computer.

[21]  Dimitrios S. Nikolopoulos,et al.  Programming Multiprocessors with Explicitly Managed Memory Hierarchies , 2009, Computer.

[22]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[23]  Haakon Dybdahl,et al.  Haakon Dybdahl Architectural Techniques to Improve Cache Utilization , 2007 .

[24]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[25]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[26]  Erik Duval,et al.  Managing Shared Resources , 2000, EuroPLoP.

[27]  Anant Agarwal,et al.  rMPI: Message Passing on Multicore Processors with On-Chip Interconnect , 2008, HiPEAC.

[28]  David A. Patterson,et al.  Latency lags bandwith , 2004, CACM.

[29]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[30]  Angela C. Sodan,et al.  Parallelism via Multithreaded and Multicore CPUs , 2010, Computer.

[31]  Santosh G. Abraham,et al.  Chip multithreading: opportunities and challenges , 2005, 11th International Symposium on High-Performance Computer Architecture.

[32]  Olaf René Birkeland,et al.  A recursive MISD architecture for pattern matching , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[33]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.