The Cell is a heterogeneous multicore processor that has attracted much attention in the HPC community. The bulk of the computational workload on the Cell processor is carried by eight co-processors called SPEs. The SPEs are connected to each other and to main memory by a high speed bus called the Element Interconnect Bus (EIB), which is capable of 204.8 GB/s. However, access to the main memory is limited by the performance of the Memory Interface Controller (MIC) to 25.6 GB/s. It is, therefore, advantageous for the algorithms to be structured such that SPEs communicate directly between themselves over the EIB, and make less use of memory. We show that the actual bandwidth obtained for inter-SPE communication is strongly influenced by the assignment of threads to SPEs (thread-SPE affinity) in many realistic communication patterns. We identify the bottlenecks to optimal performance and use this information to determine good affinities for common communication patterns. Our solutions improve performance by up to a factor of two over the default assignment. We also discuss the optimization of affinity on a Cell blade consisting of two Cell processors, and provide a software tool to help with this. Our results will help Cell application developers choose good affinities for their applications.
[1]
Ashok Srinivasan,et al.
Optimization of Collective Communication in Intra-cell MPI
,
2007,
HiPC.
[2]
Donald Newell,et al.
An in-depth analysis of the impact of processor affinity on network performance
,
2004,
Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955).
[3]
Evangelos P. Markatos,et al.
Using processor affinity in loop scheduling on shared-memory multiprocessors
,
1992,
Supercomputing '92.
[4]
Ashok Srinivasan,et al.
Parallel Quasi-Monte Carlo Methods on a Heterogeneous Cluster
,
2002
.
[5]
Michael Stumm,et al.
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
,
2007,
EuroSys '07.
[6]
Donald F. Towsley,et al.
The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version)
,
1996,
TNET.
[7]
Fabrizio Petrini,et al.
Cell Multiprocessor Communication Network: Built for Speed
,
2006,
IEEE Micro.