A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor

This paper describes the implementation of a runtime library for asynchronous communication in the Cell BE processor. The runtime library implementation provides with several services that allow the compiler to generate code, maximizing the chances for overlapping communication and computation. The library implementation is organized as a Software Cache and the main services correspond to mechanisms for data look up, data placement and replacement, data write back, memory synchronization and address translation. The implementation guarantees that all those services can be totally uncoupled when dealing with memory references. Therefore this provides opportunities to the compiler to organize the generated code in order to overlap as much as possible computation with communication. The paper also describes the necessary mechanism to overlap the communication related to write back operations with actual computation. The paper includes the description of the compiler basic algorithms and optimizations for code generation. The system is evaluated measuring bandwidth and global updates ratios, with two benchmarks from the HPCC benchmark suite: Stream and Random Access.

[1]  Daisuke Takahashi,et al.  The HPC Challenge (HPCC) benchmark suite , 2006, SC.

[2]  Fabrizio Petrini,et al.  Cell Multiprocessor Communication Network: Built for Speed , 2006, IEEE Micro.

[3]  Michael Gschwind,et al.  Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[4]  Kathryn M. O'Brien,et al.  Optimizing the Use of Static Buffers for DMA on a CELL Chip , 2006, LCPC.

[5]  Binyu Zang,et al.  Optimizing software cache performance of packet processing applications , 2007, LCTES.

[6]  Scott A. Mahlke,et al.  Compiler-managed partitioned data caches for low power , 2007, LCTES '07.

[7]  Long Li,et al.  Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors , 2007, International Symposium on Code Generation and Optimization (CGO'07).