论文信息 - On the random access performance of Cell Broadband Engine with graph analysis application

On the random access performance of Cell Broadband Engine with graph analysis application

The Cell Broad Engine (BE) Processor has unique memory access architecture besides its powerful computing engines. Many computing-intensive applications have been ported to Cell/BE successfully. But memory-intensive applications are rarely investigated except for several micro benchmarks. Since Cell/BE has powerful software visible DMA engine, this paper studies on whether Cell/BE is suit for applica- tions with large amount of random memory accesses. Two benchmarks, GUPS and SSCA#2, are used. The latter is a rather complex one that in representative of real world graph analysis applications. We find both benchmarks have good performance on Cell/BE based IBM QS20/22. Com- pared with 2 conventional multi-processor systems with the same core/thread number, GUPS is about 40-80% fast and SSCA#2 about 17-30% fast. The dynamic load balanc- ing and software pipeline for optimizing SSCA#2 are intro- duced. Based on the experiment, the potential of Cell/BE for random access is analyzed in detail as well as its limita- tions of memory controller, atomic engine and TLB manage- ment.Our research shows although more programming effort are needed, Cell/BE has the potencial for irregular memory access applications.

David A. Bader | Seunghwa Kang | Mingyu Chen | Mingyu Chen | Seunghwa Kang

[1] U. Brandes. A faster algorithm for betweenness centrality , 2001 .

[2] David A. Bader,et al. On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[3] David A. Bader,et al. FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine , 2007, HiPC.

[4] Guang R. Gao,et al. Experience on optimizing irregular computation for memory hierarchy in manycore architecture , 2008, ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming.

[5] Fabrizio Petrini,et al. Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[6] Tu Deng-Biao,et al. Fine-Grained Parallel Betweenness Centrality Algorithm Without Lock Synchronization , 2011 .

[7] David A. Bader,et al. On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[8] David A. Bader,et al. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[9] Fabrizio Petrini,et al. Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10] David A. Bader,et al. Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[11] David A. Bader. Petascale Computing: Algorithms and Applications , 2007 .

[12] David A. Bader,et al. Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors , 2005, HiPC.