Particle simulation on the Cell BE architecture

This paper presents two parallel formulations for the Barnes-Hut algorithm on the Cell architecture, which differ in tree distribution and construction phases of the algorithm. In the initial parallelization, the domains are dynamically partitioned and assigned to the synergistic processing elements (SPEs), and SPEs construct local trees of the sub-domains in parallel. The enhanced parallelization scheme provides better clustering of the particles by sequentially constructing the global tree of the entire work space in the power processing element (PPE) and by partitioning the tree into sub-trees that can fit in the Local Store. SPEs operate on the sub-tree data and construct local trees in parallel. Our experimental evaluation indicates that this application performs much faster on the Cell BE compared to the Intel Xeon based system. Specifically, our first and second methods on the Cell BE outperform Intel Xeon by a factor of 5.8 and 7.1 for 8192 particles, respectively.

[1]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[2]  M. S. Warren,et al.  A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.

[3]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[4]  Anand Sivasubramaniam,et al.  Architectural Mechanisms for Explicit Communication in Shared Memory Multiprocessors , 1995, SC.

[5]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[6]  Michael Kistler,et al.  Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, IPDPS.

[7]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[8]  David A. Bader,et al.  High performance combinatorial algorithm design on the Cell Broadband Engine processor , 2007, Parallel Comput..

[9]  Daniel A. Brokenshire Maximizing the power of the Cell Broadband Engine processor : 25 tips to optimal application performance Level : Intermediate , 2007 .

[10]  Jonas Larsson,et al.  Space Time Adaptive Processing Estimates for IBM/Sony/Toshiba Cell Broadband Engine Processor , 2006, 2006 International Radar Symposium.

[11]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[12]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[13]  Robert Cooper,et al.  Performance Benchmarks and Programmability of the IBM / Sony / Toshiba Cell Broadband Engine Processor , 2006 .

[14]  David A. Bader,et al.  On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[15]  Fabrizio Petrini,et al.  Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[16]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[17]  L. Greengard The Rapid Evaluation of Potential Fields in Particle Systems , 1988 .

[18]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[19]  Sally A. McKee,et al.  Proceedings of the 3rd conference on Computing frontiers , 2006 .

[20]  Allan Gottlieb Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, Australia, May 1992 , 1992, ISCA.

[21]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[22]  David A. Bader,et al.  FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine , 2007, HiPC.

[23]  Walter Dehnen,et al.  A Hierarchical O(N) Force Calculation Algorithm , 2002 .

[24]  Ananth Grama,et al.  Scalable parallel formulations of the Barnes-Hut method for n-body simulations , 1994, Proceedings of Supercomputing '94.

[25]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[26]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[27]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.