Compiler manipulation of stream descriptors for data access optimization

Efficient data movement is one of the key attributes for high performance computing. This paper advocates the use of stream descriptors to convey memory access patterns from the programmer to the compiler. This explicit separation of computation and data movement enables the compiler to manipulate the stream descriptors to match the system's interconnect capabilities. Data movement is optimized by manipulating stream descriptors to target specific optimizations such as bandwidth management and buffer allocation. In this paper, bandwidth improvements are shown for an example system performing video analysis using computer vision methods. The system includes key hardware mechanisms that use stream descriptors to prefetch and align data for stream processors

[1]  Sek M. Chai,et al.  Streaming I/O for imaging applications , 2005, Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05).

[2]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[3]  John Wawrzynek,et al.  A Streaming Multi-Threaded Model , 2001 .

[4]  W. Dally,et al.  Stream Scheduling , 2001 .

[5]  Sek M. Chai,et al.  Memory bandwidth optimization through stream descriptors , 2006, SIGARCH Comput. Archit. News.

[6]  Pierre Boulet,et al.  Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..

[7]  S. Chai,et al.  Stream Memory Subsystem in Reconfigurable Platforms , 2005 .

[8]  Sally A. McKee,et al.  Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.

[9]  Sek M. Chai,et al.  Streaming processors for next-generation mobile imaging applications , 2005, IEEE Communications Magazine.

[10]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[11]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[12]  Monica S. Lam,et al.  Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.

[13]  Sek M. Chai,et al.  Memory bandwidth optimization through stream descriptors , 2005, MEDEA '05.

[14]  W. D. Peterson Specification for the : WISHBONE System-On-Chip ( SoC ) Interconnection Architecture for Portable IP Cores , 2001 .

[15]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[16]  Sally A. McKee,et al.  METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[17]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.