Stream Memory Subsystem in Reconfigurable Platforms

High performance computing platforms require an efficient memory subsystem to keep processors busy. This paper proposes a memory hierarchy using stream units to move stream data between memory and processors. The stream units prefetch and align data based on stream descriptors, a mechanism that allows programmers to indicate data movement explicitly by describing their memory access patterns. Reconfigurable logic allows exploration of different memory hierarchy configurations based on application needs. This paper also presents an example stream unit design with preliminary synthesis results.

[1]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[2]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[3]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[4]  Maya Gokhale,et al.  Stream-oriented FPGA computing in the Streams-C high level language , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[5]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE) , 2000, FPL.

[6]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[7]  William R. Mark,et al.  Cg: a system for programming graphics hardware in a C-like language , 2003, ACM Trans. Graph..

[8]  Wayne Luk,et al.  Design space exploration with A Stream Compiler , 2003, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798).

[9]  Michael A. Schuette,et al.  The Reconfigurable Streaming Vector Processor (RSVPTM) , 2003, MICRO.

[10]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[11]  Sek M. Chai,et al.  Streaming I/O for imaging applications , 2005, Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05).

[12]  Oddvar Søråsen,et al.  Reconfigurable address generators for stream-based computation implemented on FPGAs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[13]  Sek M. Chai,et al.  Streaming processors for next-generation mobile imaging applications , 2005, IEEE Communications Magazine.

[14]  David A. Patterson,et al.  Latency Lags Bandwidth , 2005, ICCD.

[15]  Sek M. Chai,et al.  Memory bandwidth optimization through stream descriptors , 2005, MEDEA '05.

[16]  Sek M. Chai,et al.  FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.