Supporting a flexible parallel programming model on a network of workstations

We introduce a shared memory software prototype system for executing programs with nested parallelism on a network of workstations. This programming model exhibits a very convenient and natural programming style and provides functionality similar to a subset of Compositional C++. Such programming model is especially suitable for computations whose complexity and parallelism emerges only during their execution, as in divide and conquer problems. To both support and take advantage of the flexibility inherent in the programming model, we develop an architecture, which distributes both the shared memory management and the computation, removing bottlenecks inherent in centralization, thus also providing scalability and dependability. The system supports also dynamic load balancing, and fault tolerance-both transparently to the programmer. The prototype performs well using the realistic platforms of non-dedicated network of workstation. We describe encouraging performance experiments on a network in which some of the machines became slow unpredictably (to the application program). The system coped well with such dynamic behavior.

[1]  Partha Dasgupta,et al.  CALYPSO: a novel software system for fault-tolerant parallel processing on distributed platforms , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[2]  Dennis Shasha,et al.  PLinda 2.0: a transactional/checkpointing approach to fault tolerant Linda , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[3]  Yonatan Aumann,et al.  Highly efficient asynchronous execution of large-grained parallel programs , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[4]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[5]  Partha Dasgupta,et al.  Parallel processing on networks of workstations: a fault-tolerant, high performance approach , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[6]  Richard D. Schlichting,et al.  Supporting Fault-Tolerant Parallel Programming in Linda , 1995, IEEE Trans. Parallel Distributed Syst..

[7]  Willy Zwaenepoel,et al.  Adaptive software cache management for distributed shared memory architectures , 1990, ISCA '90.

[8]  K. Mani Chandy,et al.  CC++: A Declarative Concurrent Object Oriented Programming Notation , 1993 .

[9]  Erik Seligman,et al.  Dome: Distributed Object Migration Environment , 1994 .

[10]  Erik Seligman,et al.  Dome: parallel programming in a distributed computing environment , 1996, Proceedings of International Conference on Parallel Processing.

[11]  Paul G. Spirakis,et al.  Efficient robust parallel computations , 2018, STOC '90.

[12]  John B. Carter,et al.  Efficient distributed shared memory based on multi-protocol release consistency , 1995 .

[13]  Michael J. Quinn,et al.  Designing Efficient Algorithms for Parallel Computers , 1987 .

[14]  Jack J. Dongarra,et al.  The PVM Concurrent Computing System: Evolution, Experiences, and Trends , 1994, Parallel Comput..