Shared Memory in the Many-Core Age

With the evolution toward fast networks of many-core processors, the design assumptions at the basis of software-level distributed shared memory (DSM) systems change considerably. But efficient DSMs are needed because they can significantly simplify the implementation of complex distributed algorithms. This paper discusses implications of the many-core evolution and derives a set of reusable elementary operations for future software DSMs. These elementary operations will help in exploring and evaluating new memory models and consistency protocols.

[1]  Hans-Juergen Boehm,et al.  HP Laboratories , 2006 .

[2]  John B. Carter,et al.  Design of the Munin Distributed Shared Memory System , 1995, J. Parallel Distributed Comput..

[3]  Jörg Henkel,et al.  OctoPOS : A Parallel Operating System for Invasive Computing , 2011 .

[4]  Barbara M. Chapman Scalable Shared Memory Parallel Programming: Will One Size Fit All? , 2006, PDP.

[5]  Marc Snir Shared memory programming on distributed memory systems , 2009, PGAS '09.

[6]  Avi Mendelson,et al.  Programming model for a heterogeneous x86 platform , 2009, PLDI '09.

[7]  Wolfgang Schröder-Preikschat,et al.  Vote for peace: implementation and performance of a parallel operating system , 1997, IEEE Concurrency.

[8]  Francesco Zappa Nardelli,et al.  x86-TSO , 2010, Commun. ACM.

[9]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[10]  John B. Carter,et al.  Distributed shared memory: where we are and where we should be headed , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[11]  Stefan Lankes,et al.  Revisiting shared virtual memory systems for non-coherent memory-coupled cores , 2012, PMAM '12.

[12]  Assaf Schuster,et al.  Thread migration and its applications in distributed shared memory systems , 1998, J. Syst. Softw..

[13]  Leigh Stoller,et al.  Making distributed shared memory simple, yet efficient , 1998, Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments.

[14]  Paul E. McKenney,et al.  Memory Barriers: a Hardware View for Software Hackers , 2010 .

[15]  Sriram R. Vangal,et al.  A 2 Tb/s 6$\,\times\,$ 4 Mesh Network for a Single-Chip Cloud Computer With DVFS in 45 nm CMOS , 2011, IEEE Journal of Solid-State Circuits.

[16]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[17]  Anant Agarwal,et al.  The KILL Rule for Multicore , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[18]  Thomas Lippert,et al.  The DEEP Project - Pursuing Cluster-Computing in the Many-Core Era , 2013, 2013 42nd International Conference on Parallel Processing.

[19]  Milo M. K. Martin,et al.  Why on-chip cache coherence is here to stay , 2012, Commun. ACM.

[20]  Sabela Ramos,et al.  Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.

[21]  Roger M. Needham,et al.  On the duality of operating system structures , 1979, OPSR.

[22]  Barbara Horner-Miller,et al.  Proceedings of the 2006 ACM/IEEE conference on Supercomputing , 2006 .

[23]  Katherine A. Yelick,et al.  UPC++: A PGAS Extension for C++ , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[24]  Pradeep Dubey,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.

[25]  Kai Li,et al.  IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.

[26]  Tarek A. El-Ghazawi,et al.  UPC: unified parallel C , 2006, SC.

[27]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[28]  Francesco Zappa Nardelli,et al.  Relaxed memory models must be rigorous , 2009 .

[29]  R. Goeckelmann,et al.  Plurix, a distributed operating system extending the single system image concept , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[30]  Brett D. Fleisch,et al.  Mirage: a coherent distributed shared memory design , 1989, SOSP '89.

[31]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[32]  Eduard Ayguadé,et al.  Implementing OmpSs support for regions of data in architectures with multiple address spaces , 2013, ICS '13.