Evaluating application mapping scenarios on the Cell-B.E.

Applications running on multicore platforms are difficult to program, and even more difficult to optimize, mainly due to (1) the several layers where the optimizations occur and (2) the multitude of available resources to be exploited in parallel. Although low-level optimizations only target code running on individual cores, high-level optimizations (e.g. data- and task-parallelism) target the overall application performance. In this paper, we focus on the latter, by evaluating possible mapping scenarios of a real application on a heterogeneous multicore processor. Specifically, we focus on analyzing the impact of combining data- and task-parallelism for a multimedia analysis application running on the Cell Broadband Engine (Cell-B.E.). We find that both low-level and high-level optimizations are important for the overall application speed-up. However, we show that a speed-up factor of over 20 for the application running on Cell-B.E. can only be obtained if core utilization is increased by combining data- and task-parallelism. Thus, we consider this case study essential for building expertise in both application optimization and performance analysis for multicore platforms. Copyright © 2008 John Wiley & Sons, Ltd. The work presented here has been mostly done at IBM TJ Watson Research Center, U.S.A., and it is partly supported by the Scalp project funded by STW-Progress, The Netherlands.

[1]  Fabrizio Petrini,et al.  Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[2]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[3]  David A. Bader,et al.  FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine , 2007, HiPC.

[4]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Shih-Fu Chang,et al.  Tools and techniques for color image retrieval , 1996, Electronic Imaging.

[6]  Qiang Liu,et al.  An Effective Strategy for Porting C++ Applications on Cell , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[7]  Thomas Stricker,et al.  Combining task- and data parallelism to speed up protein folding on a desktop grid platform , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[8]  Qiang Liu,et al.  Digital Media Indexing on the Cell Processor , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[9]  Alexandros Stamatakis,et al.  RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  John R. Smith,et al.  A Hybrid Framework for Detecting the Semantics of Concepts and Context , 2003, CIVR.

[11]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[12]  John R. Smith,et al.  Video texture indexing using spatio-temporal wavelets , 2002, Proceedings. International Conference on Image Processing.

[13]  Robert B. Fisher,et al.  Hypermedia image processing reference , 1996 .

[14]  I. Wald,et al.  Ray Tracing on the Cell Processor , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[15]  Xizhou Feng,et al.  Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case Study of the Cell BE , 2008, HiPEAC.

[16]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[17]  Philip S. Yu,et al.  CellSort: High Performance Sorting on the Cell Processor , 2007, VLDB.

[18]  Michael Gschwind Chip multiprocessing and the cell broadband engine , 2006, CF '06.