Coarse-Grained Reconfigurable Architectures

Current trends in technology scaling, coupled with the increasing compute demands with a limited power budget, has spurred research into specialized accelerator architectures. Coarse-Grained Reconfigurable Architectures (CGRAs) have been shown to achieve higher performance and energy efficiency compared to conventional instruction-based architectures by avoiding instruction overheads with reconfigurable data and control paths. CGRAs also avoid the hardware and programming overheads of fine-grained alternatives such as Field-Programmable Gate Arrays (FPGAs) by raising the hardware abstraction. Designing efficient CGRAs requires a careful calibration of the granularity of its elements and building automated compilation flow to map high-level programs to the reconfigurable elements. This chapter reviews the challenges and opportunities in the field of CGRAs.

[1]  Russell Tessier,et al.  Reconfigurable Computing Architectures , 2015, Proceedings of the IEEE.

[2]  Karthikeyan Sankaralingam,et al.  DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.

[3]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[4]  Seth Copen Goldstein,et al.  Tartan: evaluating spatial computation for whole program execution , 2006, ASPLOS XII.

[5]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[6]  Kunle Olukotun,et al.  Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator , 2019, MLSys.

[7]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[8]  Yoav Etsion,et al.  Single-graph multiple flows: Energy efficient design alternative for GPGPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[9]  Karthikeyan Sankaralingam,et al.  On-Chip Interconnection Networks of the TRIPS Chip , 2007, IEEE Micro.

[10]  Russell Tessier,et al.  FPGA Architecture: Survey and Challenges , 2008, Found. Trends Electron. Des. Autom..

[11]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[12]  Steven J. E. Wilton,et al.  A detailed power model for field-programmable gate arrays , 2005, TODE.

[13]  Michalis D. Galanis,et al.  Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  Kunle Olukotun,et al.  Plasticine: A reconfigurable architecture for parallel patterns , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[15]  Kunle Olukotun,et al.  Spatial: a language and compiler for application accelerators , 2018, PLDI.

[16]  Christopher J. Hughes,et al.  Single-Instruction Multiple-Data Execution , 2015, Single-Instruction Multiple-Data Execution.

[17]  Kunle Olukotun,et al.  Scalable Interconnects for Reconfigurable Spatial Architectures , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[18]  Wayne Luk,et al.  Reconfigurable computing: architectures and design methods , 2005 .

[19]  Tulika Mitra,et al.  HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[20]  Mingyu Gao,et al.  HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[21]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[22]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[23]  Kenneth A. Ross,et al.  Q100: the architecture and design of a database processing unit , 2014, ASPLOS.

[24]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[25]  Kunle Olukotun,et al.  Locality-Aware Mapping of Nested Parallel Patterns on GPUs , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Eric S. Chung,et al.  A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[27]  Antonia Zhai,et al.  Triggered instructions: a control paradigm for spatially-programmed architectures , 2013, ISCA.

[28]  Steven Swanson,et al.  The WaveScalar architecture , 2007, TOCS.

[29]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[30]  Benton H. Calhoun,et al.  Flexible Circuits and Architectures for Ultralow Power , 2010, Proceedings of the IEEE.

[31]  David A. Patterson,et al.  A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution , 2018, IEEE Micro.

[32]  Richard W. Vuduc,et al.  Improving the energy efficiency of Big Cores , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[33]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[34]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[35]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[36]  Carl Ebeling,et al.  Static versus scheduled interconnect in Coarse-Grained Reconfigurable Arrays , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[37]  Wu-chun Feng,et al.  Measuring and modeling on-chip interconnect power on real hardware , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[38]  Carl Ebeling,et al.  Architecture design of reconfigurable pipelined datapaths , 1999, Proceedings 20th Anniversary Conference on Advanced Research in VLSI.