Design of efficient Java message-passing collectives on multi-core clusters

This paper presents a scalable and efficient Message-Passing in Java (MPJ) collective communication library for parallel computing on multi-core architectures. The continuous increase in the number of cores per processor underscores the need for scalable parallel solutions. Moreover, current system deployments are usually multi-core clusters, a hybrid shared/distributed memory architecture which increases the complexity of communication protocols. Here, Java represents an attractive choice for the development of communication middleware for these systems, as it provides built-in networking and multithreading support. As the gap between Java and compiled languages performance has been narrowing for the last years, Java is an emerging option for High Performance Computing (HPC).Our MPJ collective communication library increases Java HPC applications performance on multi-core clusters: (1) providing multi-core aware collective primitives; (2) implementing several algorithms (up to six) per collective operation, whereas publicly available MPJ libraries are usually restricted to one algorithm; (3) analyzing the efficiency of thread-based collective operations; (4) selecting at runtime the most efficient algorithm depending on the specific multi-core system architecture, and the number of cores and message length involved in the collective operation; (5) supporting the automatic performance tuning of the collectives depending on the system and communication parameters; and (6) allowing its integration in any MPJ implementation as it is based on MPJ point-to-point primitives. A performance evaluation on an InfiniBand and Gigabit Ethernet multi-core cluster has shown that the implemented collectives significantly outperform the original ones, as well as higher speedups when analyzing the impact of their use on collective communications intensive Java HPC applications. Finally, the presented library has been successfully integrated in MPJ Express (http://mpj-express.org), and will be distributed with the next release.

[1]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[2]  Juan Touriño,et al.  Performance analysis of Java message-passing libraries on fast Ethernet, Myrinet and SCI clusters , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[3]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[4]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..

[5]  Geoffrey C. Fox,et al.  mpiJava 1.2: API Specification , 1999 .

[6]  Jesper Larsson Träff,et al.  The Hierarchical Factor Algorithm for All-to-All Communication (Research Note) , 2002, Euro-Par.

[7]  Juan Touriño,et al.  Java for high performance computing: assessment of current research and practice , 2009, PPPJ '09.

[8]  Aamir Shafi,et al.  Towards efficient shared memory communications in MPJ express , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Jason Maassen,et al.  CCJ: object‐based message passing and collective communication in Java , 2003, Concurr. Comput. Pract. Exp..

[10]  Guillaume Mercier,et al.  Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments , 2009, PVM/MPI.

[11]  Juan Touriño,et al.  Performance analysis of message-passing libraries on high-speed clusters , 2010, Comput. Syst. Sci. Eng..

[12]  Juan Touriño,et al.  F-MPJ: scalable Java message-passing communications on parallel systems , 2012, The Journal of Supercomputing.

[13]  Geoffrey C. Fox,et al.  MPJ: MPI-like message passing for Java , 2000 .

[14]  Geoffrey C. Fox,et al.  Collective Communications for Scalable Programming , 2005, ISPA.

[15]  Rob van Nieuwpoort,et al.  MPJ/Ibis: A Flexible and Efficient Message Passing Platform for Java , 2005, PVM/MPI.

[16]  Xiaofang Zhao,et al.  Performance analysis and optimization of MPI collective operations on multi-core clusters , 2009, The Journal of Supercomputing.

[17]  Mark Baker,et al.  Nested parallelism for multi-core HPC systems using Java , 2009, J. Parallel Distributed Comput..

[18]  Siddhartha Chatterjee,et al.  An Evaluation of Java for Numerical Computing , 1998, ISCOPE.

[19]  Juan Touriño,et al.  NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[20]  Hao Zhu,et al.  Hierarchical Collectives in MPICH2 , 2009, PVM/MPI.

[21]  William Pugh,et al.  MPJava: High-Performance Message Passing in Java Using Java.nio , 2003, LCPC.

[22]  Mark Baker,et al.  MPJ Express Meets Gadget: Towards a Java Code for Cosmological Simulations , 2006, PVM/MPI.

[23]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience: Research Articles , 2007 .

[24]  Mark Baker,et al.  University of Portsmouth Portsmouth Hants United Kingdom Po1 2up a Comparative Study of Java and C Performance in Two Large-scale Parallel Applications , 2022 .

[25]  Dhabaleswar K. Panda,et al.  Fast collective operations using shared and remote memory access protocols on clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[26]  Luiz Angelo Steffenel,et al.  Fast Tuning of Intra-cluster Collective Communications , 2004, PVM/MPI.

[27]  Jack J. Dongarra,et al.  MPI Collective Algorithm Selection and Quadtree Encoding , 2006, PVM/MPI.

[28]  Geoffrey C. Fox,et al.  MPIJAVA: An Object-Oriented JAVA Interface to MPI , 1999, IPPS/SPDP Workshops.