Natural HPC substrate: Exploitation of mixed multicore CPU and GPUs

Recent GPU developments have attracted much interest in the HPC community. Since each GPU interface requires a dedicated host processor, the unused high performance non-GPU processors are simply wasted. GPUs are energy intensive and are more likely to fail than CPUs, we are interested in using all processors to a) boosting application performance, and b) defending GPU failures. This paper reports parallel computation experiments using a natural semantic multiplexing substrate; we call Deeply Decoupled Parallel Processing (D2P2). The idea is to apply statistic multiplexing on application's semantic network with application-defined data tuples. Tuple space parallel processing is a natural choice for applying statistic multiplexing on application semantic networks. We report up to 53% performance gain for CPU:GPU capability ratio of 1:5. For faster GPUs, CPUs are better used to prevent application halt when GPU fails. The D2P2 substrate allows fault tolerant parallel processing using heterogeneous processors.

[1]  A. Krikelis,et al.  Associative processing and processors , 1994, Computer.

[2]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[3]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[4]  Space-Based Architecture and the End of Tier-based Computing 1 Space-Based Architecture and The End of Tier-based Computing White Paper , 2022 .

[5]  Yuan Shi,et al.  Timing Models and Local Stopping Criteria for Asynchronous Iterative Algorithms , 1999, J. Parallel Distributed Comput..

[6]  Daniel Marques,et al.  Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7]  Yuan Shi A distributed programming model and its applications to computation intensive problems for heterogeneous environments , 2008 .

[8]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[9]  Justin Y. Shi,et al.  Decoupling as a Foundation for Large Scale Parallel Computing , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[10]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[11]  Xiaoying Bai,et al.  A Tuple-Space-Based Coordination Architecture for Test Agents in the MAST Framework , 2006, 2006 Second IEEE International Symposium on Service-Oriented System Engineering (SOSE'06).

[12]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[13]  Nicholas Carriero,et al.  How to write parallel programs - a first course , 1990 .

[14]  Yuan Shi,et al.  Automatic program parallelization using stateless parallel processing architecture , 2004 .

[15]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.