Graph minor approach for application mapping on CGRAs

Coarse-grained reconfigurable arrays (CGRA) exhibit high performance, improved flexibility, low cost, and power efficiency for various application domains. Compute-intensive loop kernels are mapped to CGRA through modified modulo scheduling algorithms that integrate placement and routing. Most existing approaches are heavily influenced by VLIW compilation and FPGA synthesis techniques. A salient feature of these approaches is that data routing from a single source node to multiple destination nodes follow independent paths leading to resource wastage and hence inefficient schedule.We transform the CGRA mapping problem with route sharing into a graph minor problem. Our graph minor formalization provides a solid foundation for application mapping on CGRA. We provide an efficient framework based on graph mapping to solve this problem. Experimental validation shows that our approach leads to higher performance compared to state-of-the-art solutions with better resource utilization and minimal compilation time.

[1]  Kiyoung Choi,et al.  FloRA: Coarse-grained reconfigurable architecture with floating-point operation capability , 2009, 2009 International Conference on Field-Programmable Technology.

[2]  Nader Bagherzadeh,et al.  A Modulo Scheduling Algorithm for a Coarse-Grain Reconfigurable Array Template , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[3]  S. K. Nandy,et al.  Synthesis of application accelerators on Runtime Reconfigurable Hardware , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[4]  Rajesh Gupta,et al.  Network topology exploration of mesh-based coarse-grain reconfigurable architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[5]  Kiyoung Choi,et al.  Compilation approach for coarse-grained reconfigurable architectures , 2003, IEEE Design & Test of Computers.

[6]  Carl Ebeling,et al.  SPR: an architecture-adaptive CGRA mapping tool , 2009, FPGA '09.

[7]  Aviral Shrivastava,et al.  A Graph Drawing Based Spatial Mapping Algorithm for Coarse-Grained Reconfigurable Architectures , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Scott A. Mahlke,et al.  Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures , 2006, CASES '06.

[9]  B. Mohar,et al.  Graph Minors , 2009 .

[10]  N. Bansal,et al.  Analysis of the Performance of Coarse-Grain Reconfigurable Architectures with Different Processing Element Configurations , 2003 .

[11]  Rudy Lauwereins,et al.  Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling , 2003, DATE.

[12]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[13]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[15]  Rudy Lauwereins,et al.  ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix , 2003, FPL.

[16]  Javier Zalamea,et al.  MIRS: Modulo Scheduling with Integrated Register Spilling , 2001, LCPC.

[17]  John E. Hopcroft,et al.  The Directed Subgraph Homeomorphism Problem , 1978, Theor. Comput. Sci..

[18]  Aviral Shrivastava,et al.  SPKM : A novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures , 2008, 2008 Asia and South Pacific Design Automation Conference.

[19]  Christine Eisenbeis,et al.  The meeting graph: a new model for loop cyclic register allocation , 1995, PACT.

[20]  G. Levi A note on the derivation of maximal common subgraphs of two directed or undirected graphs , 1973 .

[21]  Theodore S. Norvell,et al.  Analysis of Inner-Loop Mapping onto Coarse-Grained Reconfigurable Architectures Using Hybrid Particle Swarm Optimization , 2011, Int. J. Organ. Collect. Intell..

[22]  Scott A. Mahlke,et al.  Trimaran: An Infrastructure for Research in Instruction-Level Parallelism , 2004, LCPC.

[23]  Liang Chen,et al.  Graph minor approach for application mapping on CGRAs , 2012, FPT.

[24]  Michalis D. Galanis,et al.  Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[25]  Jean Vuillemin,et al.  A reconfigurable arithmetic array for multimedia applications , 1999, FPGA '99.

[26]  Scott A. Mahlke,et al.  Edge-centric modulo scheduling for coarse-grained reconfigurable architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  Aviral Shrivastava,et al.  EPIMap: Using Epimorphism to map applications on CGRAs , 2012, DAC Design Automation Conference 2012.

[28]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[29]  Aviral Shrivastava,et al.  High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[31]  Theodore S. Norvell,et al.  Mapping loops onto Coarse-Grained Reconfigurable Architectures using Particle Swarm Optimization , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[32]  Paul D. Seymour,et al.  Graph Minors: XVII. Taming a Vortex , 1999, J. Comb. Theory, Ser. B.

[33]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[34]  Steven J. E. Wilton,et al.  Register file architecture optimization in a coarse-grained reconfigurable architecture , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[35]  Giuseppe F. Italiano,et al.  Amortized Efficiency of a Path Retrieval Data Structure , 1986, Theor. Comput. Sci..

[36]  Aviral Shrivastava,et al.  REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[37]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[38]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  FoggiaPasquale,et al.  A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs , 2004 .

[40]  Paul D. Seymour,et al.  Graph Minors. XX. Wagner's conjecture , 2004, J. Comb. Theory B.

[41]  ChoiKiyoung,et al.  Compilation Approach for Coarse-Grained Reconfigurable Architectures , 2003 .

[42]  Bjorn De Sutter,et al.  Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays , 2008, LCTES '08.

[43]  T.S. Norvell,et al.  Compiling parallel applications to Coarse-Grained Reconfigurable Architectures , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[44]  Nikil D. Dutt,et al.  Interconnect-Aware Mapping of Applications to Coarse-Grain Reconfigurable Architectures , 2004, FPL.

[45]  Scott A. Mahlke,et al.  Scalable subgraph mapping for acyclic computation accelerators , 2006, CASES '06.

[46]  B. Ramakrishna Rau,et al.  Elcor's Machine Description System: Version 3.0 , 1998 .

[47]  Sándor P. Fekete,et al.  A minimization version of a directed subgraph homeomorphism problem , 2009, Math. Methods Oper. Res..