M2C: A Massive Performance and Energy Throttling Framework for High-Performance Computing Systems

At the Petascale level of performance, High-Performance Computing (HPC) systems require significant use of supercomputers with the extensive parallel programming approaches to solve the complicated computational tasks. The Exascale level of performance having 1018 calculations per second is another remarkable achievement in computing with a fathomless influence on everyday life. The current technologies are facing various challenges while achieving ExaFlop performance through energy-efficient systems. Massive parallelism and power consumption are vital challenges for achieving ExaFlop performance. In this paper, we have introduced a novel parallel programming model that provides massive performance under power consumption limitations by parallelizing data on the heterogeneous system to provide coarse grain and fine-grain parallelism. The proposed dual-hierarchical architecture is a hybrid of MVAPICH2 and CUDA, called the M2C model, for heterogeneous systems that utilize both CPU and GPU devices for providing massive parallelism. To validate the objectives of the current study, the proposed model has been implemented using bench-marking applications including linear Dense Matrix Multiplication. Furthermore, we conducted a comparative analysis of the proposed model by existing state-of-the-art models and libraries such as MOC, kBLAS, and cuBLAS. The suggested model outperforms existing models while achieving massive performance in HPC clusters and can be considered for emerging Exascale computing systems.

[1]  Rupak Biswas,et al.  High performance computing using MPI and OpenMP on multi-core parallel systems , 2011, Parallel Comput..

[2]  Shuangshuang Jin,et al.  Thread Group Multithreading: Accelerating the Computation of an Agent-Based Power System Modeling and Simulation Tool -- C GridLAB-D , 2014, 2014 47th Hawaii International Conference on System Sciences.

[3]  Sreeram Potluri,et al.  Offloading Communication Control Logic in GPU Accelerated Applications , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[4]  Pete Beckman,et al.  Argo: An Exascale Operating System and Runtime , 2015 .

[5]  Alex Ramírez,et al.  The low-power architecture approach towards exascale computing , 2011, ScalA '11.

[6]  Matt Martineau,et al.  Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[8]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  A. Albeshri,et al.  Toward Exascale Computing Systems: An Energy Efficient Massive Parallel Computational Model , 2018 .

[10]  Fathy Alboraei Eassa,et al.  Massive Parallel Computational Model for Heterogeneous Exascale Computing System , 2017, 2017 9th IEEE-GCC Conference and Exhibition (GCCCE).

[11]  Min Zhou Petascale adaptive computational fluid dynamics , 2009 .

[12]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[13]  Dhabaleswar K. Panda,et al.  Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters , 2006, 2006 IEEE International Conference on Cluster Computing.

[14]  Jack J. Dongarra,et al.  Experiences in autotuning matrix multiplication for energy minimization on GPUs , 2015, Concurr. Comput. Pract. Exp..

[15]  M. Usman Ashraf,et al.  High performance 2-D Laplace equation solver through massive hybrid parallelism , 2017, 2017 8th International Conference on Information Technology (ICIT).

[16]  M. Okoniewski,et al.  Acceleration of large-scale FDTD simulations on high performance GPU clusters , 2009, 2009 IEEE Antennas and Propagation Society International Symposium.

[17]  Jacques Carlier,et al.  Handbook of Scheduling - Algorithms, Models, and Performance Analysis , 2004 .

[18]  Fathy Alboraei Eassa,et al.  Efficient Execution of Smart City’s Assets Through a Massive Parallel Computational Model , 2017 .

[19]  David A. Yuen,et al.  Visual exploration of data by using multidimensional scaling on multicore CPU, GPU, and MPI cluster , 2014, Concurrency and Computation.

[20]  Jack J. Dongarra,et al.  The quest for petascale computing , 2001, Comput. Sci. Eng..

[21]  Wolfgang Frings,et al.  Measuring power consumption on IBM Blue Gene/P , 2011, Computer Science - Research and Development.

[22]  M. Usman Ashraf,et al.  A Tool for Translating Sequential Source Code to Parallel Code Written in C++ and OpenACC , 2019, 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA).

[23]  M. Usman Ashraf,et al.  Performance and Power Efficient Massive Parallel Computational Model for HPC Heterogeneous Exascale Systems , 2018, IEEE Access.

[24]  Fathy Alboraei Eassa,et al.  Empirical investigation: performance and power-consumption based dual-level model for exascale computing systems , 2020, IET Softw..

[25]  Laxmikant V. Kalé,et al.  Energy-efficient computing for HPC workloads on heterogeneous manycore chips , 2015, PMAM@PPoPP.

[26]  Muhammad Usman Ashraf IMPROVING PERFORMANCE IN HPC SYSTEM UNDER POWER CONSUMPTIONS LIMITATIONS , 2019 .

[27]  David J. Lilja,et al.  Measuring computer performance : A practitioner's guide , 2000 .

[28]  William Gropp,et al.  Programming for Exascale Computers , 2013, Computing in Science & Engineering.

[29]  K. A. Gallivan,et al.  Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[30]  Y. Raghu Reddy,et al.  A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence , 2010, Parallel Comput..