An approach for an efficient execution of SPMD applications on Multi-core environments

Executing traditional Message Passing Interface (MPI) applications on multi-core cluster balancing speed and computational efficiency is a difficult task that parallel programmers have to deal with. For this reason, communications on multi-core clusters ought to be handled carefully in order to improve performance metrics such as efficiency, speedup, execution time and scalability. In this paper we focus our attention on SPMD (Single Program Multiple Data) applications with high communication volume and synchronicity and also following characteristics such as: static, local and regular. This work proposes a method for SPMD applications, which is focused on managing the communication heterogeneity (different cache level, RAM memory, network, etc.) on homogeneous multi-core computing platform in order to improve the application efficiency. In this sense, the main objective of this work is to find analytically the ideal number of cores necessary that allows us to obtain the maximum speedup, while the computational efficiency is maintained over a defined threshold (strong scalability). This method also allows us to determine how the problem size must be increased in order to maintain an execution time constant while the number of cores are expanded (weak scalability) considering the tradeoff between speed and efficiency. This methodology has been tested with different benchmarks and applications and we achieved an average improvement around 30.35% of efficiency in applications tested using different problems sizes and multi-core clusters. In addition, results show that maximum speedup with a defined efficiency is located close to the values calculated with our analytical model with an error rate lower than 5% for the applications tested.

[1]  Guillaume Mercier,et al.  Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.

[2]  Tarek A. El-Ghazawi,et al.  Performance issues in emerging homogeneous multi-core architectures , 2009, Simul. Model. Pract. Theory.

[3]  Priyanka Sharma,et al.  An Optimized and Efficient Multi Parametric Scheduling Approach for Multi-Core Systems , 2013 .

[4]  Emilio Luque,et al.  How SPMD applications could be efficiently executed on multicore environments? , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[5]  Emilio Luque,et al.  Parallel Application Signature for Performance Analysis and Prediction , 2015, IEEE Transactions on Parallel and Distributed Systems.

[6]  Lorie M. Liebrock,et al.  InterGrid: a case for internetworking islands of Grids , 2008 .

[7]  Emilio Luque,et al.  PAS2P Tool, Parallel Application Signature for Performance Prediction , 2010, PARA.

[8]  Guillaume Mercier,et al.  hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[9]  Lorie M. Liebrock,et al.  Methodology for modelling SPMD hybrid parallel computation , 2008, Concurr. Comput. Pract. Exp..

[10]  Frederica Darema,et al.  The SPMD Model : Past, Present and Future , 2001, PVM/MPI.

[11]  Raymond Namyst,et al.  A multithreaded communication engine for multicore architectures , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[12]  Emilio Luque,et al.  A tool for efficient execution of SPMD applications on multicore clusters , 2010, ICCS.

[13]  Zhen Liu,et al.  Revisiting the Cache Effect on Multicore Multithreaded Network Processors , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[14]  Mario A. R. Dantas,et al.  An Experimental Study on How to Build Efficient Multi-core Clusters for High Performance Computing , 2008, 2008 11th IEEE International Conference on Computational Science and Engineering.

[15]  Ronal Muresano Methodology for efficient execution of SPMD applications on multicore clusters , 2011 .

[16]  M B NielsenIda,et al.  Multicore challenges and benefits for high performance scientific computing , 2008 .

[17]  Xin Zhao,et al.  Scheduling parallel applications in distributed networks , 2004, Cluster Computing.

[18]  S. Akhter,et al.  Multi-core programming , 2006 .

[19]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  Yves Robert,et al.  Optimal algorithms for scheduling divisible workloads on heterogeneous systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[21]  Jon B. Weissman Prophet: automated scheduling of SPMD programs in workstation networks , 1999, Concurr. Pract. Exp..

[22]  M.D. McCool,et al.  Scalable Programming Models for Massively Multicore Processors , 2008, Proceedings of the IEEE.

[23]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[24]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[25]  Dhabaleswar K. Panda,et al.  Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[26]  Sajal K. Das,et al.  MaTCH: mapping data-parallel tasks on a heterogeneous computing platform using the cross-entropy heuristic , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[27]  Alexander Vakhitov,et al.  Adaptive Scheduling of Parallel Computations for SPMD Tasks , 2007, ICCSA.

[28]  Yuan-Fang Wang,et al.  Global Optimization for Mapping Parallel Image Processing Tasks on Distributed Memory Machines , 1997, J. Parallel Distributed Comput..

[29]  Curtis L. Janssen,et al.  Multicore challenges and benefits for high performance scientific computing , 2008, Sci. Program..

[30]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[31]  Weiqiang Wang,et al.  A Scalable Hierarchical Parallelization Framework for Molecular Dynamics Simulation on Multicore Clusters , 2009, PDPTA.

[32]  Vipin Kumar,et al.  Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[33]  Johan Simon Seland Multi-Core Programming , 2010 .

[34]  Robert Tappan Morris,et al.  Reinventing Scheduling for Multicore Systems , 2009, HotOS.

[35]  Leslie G. Valiant A Bridging Model for Multi-core Computing , 2008, ESA.

[36]  Vinita Vasudevan,et al.  Mapping Data-Parallel Tasks Onto Partially Reconfigurable Hybrid Processor Architectures , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[37]  R. Vanderwijngaart,et al.  NAS Parallel Benchmarks, Multi-Zone Versions , 2003 .

[38]  Dean M. Tullsen,et al.  Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling , 2005, ISCA 2005.

[39]  Jon B. Weissman Prophet: automated scheduling of SPMD programs in workstation networks , 1999 .

[40]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.