Managing Short-Lived and Long-Lived Values in Coarse-Grained Reconfigurable Arrays

Efficient storage in spatial processors is increasingly important as such devices get larger and support more concurrent operations. Unlike sequential processors that rely heavily on centralized storage, e.g. register files and embedded memories, spatial processors require many small storage structures to efficiently manage values that are distributed throughout the processor's fabric. The goal of this work is to determine the advantages and disadvantages of different architectural structures for storing values on-chip when optimizing for energy efficiency as well as area. Examination of applications for coarse-grained reconfigurable arrays (CGRAs) shows that most values are short-lived; they are produced and consumed quickly, but the distribution of value lifetimes has a reasonably long tail. We take advantage of this distribution to optimize register storage structures for managing short-, medium-, and long-lived values. We show that using a combination of register storage structures, each tailored for values with different lifetimes, provides a reduction in overall area-energy product to 0.69x the area-energy of the baseline architecture, without loss of performance. Finally we provide energy profiles, characteristics, and comparisons of each register structure to enable architects to guide future design choices.

[1]  George Varghese,et al.  HSRA: high-speed, hierarchical synchronous reconfigurable array , 1999, FPGA '99.

[2]  André DeHon,et al.  MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[3]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[4]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[5]  Carl Ebeling,et al.  Static versus scheduled interconnect in Coarse-Grained Reconfigurable Arrays , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[6]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[7]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[8]  Georgi Gaydadjiev,et al.  Architectural Exploration of the ADRES Coarse-Grained Reconfigurable Array , 2007, ARC.

[9]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[10]  Carl Ebeling,et al.  Exploration of pipelined FPGA interconnect structures , 2004, FPGA '04.

[11]  Carl Ebeling,et al.  SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[12]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.