Summary of multi-core hardware and programming model investigations

This report summarizes our investigations into multi-core processors and programming models for parallel scientific applications. The motivation for this study was to better understand the landscape of multi-core hardware, future trends, and the implications on system software for capability supercomputers. The results of this study are being used as input into the design of a new open-source light-weight kernel operating system being targeted at future capability supercomputers made up of multi-core processors. A goal of this effort is to create an agile system that is able to adapt to and efficiently support whatever multi-core hardware and programming models gain acceptance by the community.

[1]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[2]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[3]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[4]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[5]  Brett H. Meyer,et al.  Amdahl’s Law Revisited for Single Chip Systems , 2007, International Journal of Parallel Programming.

[6]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[7]  Jason Duell,et al.  Productivity and performance using partitioned global address space languages , 2007, PASCO '07.

[8]  Suzanne M. Kelly,et al.  Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[9]  Keith D. Underwood,et al.  Analyzing the Scalability of Graph Algorithms on Eldorado , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  Guang R. Gao,et al.  Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).

[11]  Edward A. Lee The problem with threads , 2006, Computer.

[12]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[13]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[14]  Guang R. Gao,et al.  Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures , 2007, ISCA '07.

[15]  Keith D. Underwood,et al.  Implementation and Performance of Portals 3.3 on the Cray XT3 , 2005, 2005 IEEE International Conference on Cluster Computing.

[16]  Seetharami R. Seelam,et al.  Modeling the Impact of Checkpoints on Next-Generation Systems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[17]  A. Kumar,et al.  Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip , 2008, IEEE Journal of Solid-State Circuits.

[18]  Larry Rudolph,et al.  Efficient synchronization of multiprocessors with shared memory , 1988, TOPL.

[19]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[20]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[21]  Norman P. Jouppi,et al.  Fast synchronization for chip multiprocessors , 2005, CARN.

[22]  Jialin Ju,et al.  ARMCI: A Portable Aggregate Remote Memory Copy Interface , 2000 .

[23]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[24]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[25]  Mateo Valero,et al.  Proceedings of the 2nd conference on Computing frontiers , 2005, CF 2008.

[26]  Robert J. Harrison,et al.  Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.

[27]  R. Kumar,et al.  An Integrated Quad-Core Opteron Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[28]  William J. Dally,et al.  Stream Processors: Progammability and Efficiency , 2004, ACM Queue.

[29]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[30]  Christopher J. Hughes,et al.  Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.

[31]  Pat Conway,et al.  The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.

[32]  Eric M. Schwarz,et al.  IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..

[33]  Tarek El-Ghazawi,et al.  DEVELOPING AN OPTIMIZED UPC COMPILER FOR FUTURE ARCHITECTURES , 2005 .