Data-Flow Graph Mapping Optimization for CGRA With Deep Reinforcement Learning

Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their flexibility and energy efficiency. Data flow graphs (DFGs) are often mapped onto CGRAs for acceleration. The problem of DFG mapping is challenging due to the diverse structures from DFGs and constrained hardware from CGRAs. Consequently, it is difficult to find a valid and high quality solution simultaneously. Inspired from the great progress in deep reinforcement learning (RL) for AI problems, we consider building methods that learn to map DFGs onto spatially programmed CGRAs directly from experiences. We propose RLMap, a solution that formulates DFG mapping on CGRA as an agent in RL, which unifies placement, routing and processing element insertion by interchange actions of the agent. Experimental results show that RLMap performs comparably to state-of-the-art heuristics in mapping quality, adapts to different architecture, and converges quickly.

[1]  Dong Wang,et al.  An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding , 2015, IEEE Transactions on Multimedia.

[2]  Taraneh Taghavi,et al.  Dragon2005: large-scale mixed-size placement tool , 2005, ISPD '05.

[3]  Yu Peng,et al.  Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  Carl Ebeling,et al.  SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[8]  Jason Cong,et al.  Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[9]  Kiyoung Choi,et al.  Routing-Aware Application Mapping Considering Steiner Points for Coarse-Grained Reconfigurable Architecture , 2010, ARC.

[10]  Leibo Liu,et al.  Polyhedral model based mapping optimization of loop nests for CGRAs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Carl Ebeling,et al.  RaPiD - Reconfigurable Pipelined Datapath , 1996, FPL.

[12]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[13]  Karthikeyan Sankaralingam,et al.  DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing , 2012, IEEE Micro.

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Aviral Shrivastava,et al.  A Graph Drawing Based Spatial Mapping Algorithm for Coarse-Grained Reconfigurable Architectures , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Nikil D. Dutt,et al.  Integrated Kernel Partitioning and Scheduling for Coarse-Grained Reconfigurable Arrays , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Aviral Shrivastava,et al.  SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[19]  Anil Kumar Sistla,et al.  UNTANGLED: A Game Environment for Discovery of Creative Mapping Strategies , 2013, TRETS.

[20]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[21]  Yao-Wen Chang,et al.  NTUplace: a ratio partitioning based placement algorithm for large-scale mixed-size designs , 2005, ISPD '05.

[22]  Lifeng Sun,et al.  DFGNet: Mapping dataflow graph onto CGRA by a deep learning approach , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[23]  Aviral Shrivastava,et al.  REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[24]  A. Sangiovanni-Vincentelli,et al.  The TimberWolf placement and routing package , 1985, IEEE Journal of Solid-State Circuits.

[25]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[26]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[27]  Bjorn De Sutter,et al.  Architecture Enhancements for the ADRES Coarse-Grained Reconfigurable Array , 2008, HiPEAC.

[28]  Scott A. Mahlke,et al.  Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29]  Aviral Shrivastava,et al.  EPIMap: Using Epimorphism to map applications on CGRAs , 2012, DAC Design Automation Conference 2012.

[30]  Natalie Parde,et al.  Data-Driven Mapping Using Local Patterns , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31]  Yao-Wen Chang,et al.  Routing-architecture-aware analytical placement for heterogeneous FPGAs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).