DYCE: A Resilient Shared Memory Paradigm for Heterogenous Distributed Systems without Memory Coherence

Parallel programming paradigms are commonly characterized by the core metrics of scalability, memory use, ease of use, hardware requirements and resiliency. Increasingly the support of heterogeneous environments, for example a mix of CPUs and accelerators, are of interest. Analysis of the semantics of different classes of parallel programming paradigms and their cost leads to DYCE (Distributed Yet Common Environment), a shared memory, rich but hardware friendly, race and deadlock free parallel programming paradigm that allows for resiliency without the need for explicit check-pointing code. Pointer based structures that span the memory of multiple heterogeneous compute devices are possible. Importantly, data exchange is independent of the specific data structures and does not require serialization and deserialization code, even for data structures such as a dynamic linked radix tree of strings. The analysis shows that DYCE does not require coherence from the system and thus can be executed with near minimal overhead and hardware requirements, including the page table cost for large unified address spaces that span many devices. We demonstrate efficacy with a prototype.

[1]  Francesco Zappa Nardelli,et al.  x86-TSO , 2010, Commun. ACM.

[2]  David Cunningham,et al.  X10 and APGAS at Petascale , 2016, ACM Trans. Parallel Comput..

[3]  Hai Zhao,et al.  Fourth-Order Dependency Parsing , 2012, COLING.

[4]  Sreedhar B. Kodali,et al.  The Asynchronous Partitioned Global Address Space Model , 2010 .

[5]  C. Martin 2015 , 2015, Les 25 ans de l’OMC: Une rétrospective en photos.

[6]  Alfonso Niño,et al.  A Survey of Parallel Programming Models and Tools in the Multi and Many-core Era , 2022 .

[7]  Edsger W. Dijkstra,et al.  Hierarchical ordering of sequential processes , 1971, Acta Informatica.

[8]  W. Walker,et al.  Mpi: a Standard Message Passing Interface 1 Mpi: a Standard Message Passing Interface , 1996 .

[9]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  Oliver A. McBryan,et al.  An Overview of Message Passing Environments , 1994, Parallel Comput..

[12]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[13]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[14]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[15]  Kun-Lung Wu,et al.  IBM Streams Processing Language: Analyzing Big Data in motion , 2013, IBM J. Res. Dev..

[16]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[17]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[18]  Jacob Nelson,et al.  Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX ATC.

[19]  Stefan Marr,et al.  Partitioned Global Address Space Languages , 2015, ACM Comput. Surv..

[20]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[21]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[22]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.