Single-port and multi-port collective communication operations on single and dual Cell BE processor systems

Recently, a set of factors has been leading high-performance processor architectures toward designs that feature multiple processing cores on a single chip (a.k.a. CMP). The cell broadband engine (BE) shows potential to provide high-performance to parallel applications (e.g., MPI applications). An efficient implementation of collective communication operations is one of the key issues to reach high-performance and scalability in parallel applications. In this work, we implement several collective communications and investigate their performance in terms of latency and the associated components. For this, broadcast, all-gather and total-exchange functions are implemented on the Cell BE processor.

[1]  Scott Pakin Receiver-initiated message passing over RDMA Networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[2]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[3]  Hiroshi Inoue,et al.  MPI microtask for programming the Cell Broadband Enginee , 2006 .

[4]  Nikitas J. Dimopoulos,et al.  Extended characterization of DMA transfers on the Cell BE processor , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[5]  Peng Wu,et al.  Using advanced compiler technology to exploit the performance of the Cell Broadband Enginee , 2006 .

[6]  Ashok Srinivasan,et al.  Optimization of Collective Communication in Intra-cell MPI , 2007, HiPC.

[7]  Samuel Williams,et al.  Lattice Boltzmann simulation optimization on leading multicore platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[8]  Murali Krishna,et al.  A Synchronous Mode MPI Implementation on the Cell BETM Architecture , 2007, ISPA.

[9]  Murali Krishna,et al.  Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture , 2007, SPAA '07.

[10]  Nikitas J. Dimopoulos,et al.  Efficient Communication Using Message Prediction for Cluster Multiprocessors , 2000, CANPC.

[11]  Nikitas J. Dimopoulos,et al.  Lazy direct-to-cache transfer during receive operations in a message passing environment , 2006, CF '06.

[12]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[13]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[14]  Vassilios V. Dimakopoulos,et al.  A Theory for Total Exchange in Multidimensional Interconnection Networks , 1998, IEEE Trans. Parallel Distributed Syst..

[15]  Michael Gschwind,et al.  Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[16]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[17]  Nikitas J. Dimopoulos,et al.  Characterization of single-port and multi-port collective communication operations on the Cell BE processor , 2009, 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.