Dependency-Aware Parallel Routing for Large-Scale FPGAs

Quantitative effects of Moore's Law have driven qualitative changes in FPGA architecture, applications, and tools. As a consequence, the existing EDA tools takes several hours or even days to implement the applications onto FPGAs. Typically, routing is a very time-consuming process in the EDA design flow. While several attempts have accelerated this process through parallelization, they still do not provide a strong parallel scheme for FPGA routing. In this paper we introduce a dependency-aware parallel approach, named Bamboo, to accelerate the routing time for FPGAs. With the dependency detection, Bamboo partitions the nets into multiple subsets, where the nets in the same subsets are independent, and the dependency only exists among different subsets. Specifically, the independent nets in the same subset are routed in parallel, and the subsets are processed in serial according to the original routing ordering. The partitioning problem is solved optimally using dynamic programming, and the parallelization is implemented by speculative parallelism on a single GPU. Experimental results show that our approach achieves an average of 15.13x speedup with negligible influence on the routing quality. Most importantly, it effectively maintains deterministic results and always produces the same results as the serial version.

[1]  Marcel Gort,et al.  Accelerating FPGA Routing Through Parallelization and Engineering Enhancements Special Section on PAR-CAD 2010 , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Raphael Rubin,et al.  Timing-driven pathfinder pathology and remediation: quantifying and reducing delay noise in VPR-pathfinder , 2011, FPGA '11.

[3]  Jian Wang,et al.  A novel net-partition-based multithread FPGA routing method , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[4]  Philip Brisk,et al.  Parallel FPGA routing based on the operator formulation , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Guojie Luo,et al.  Accelerate FPGA routing with parallel recursive partitioning , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[6]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Vaughn Betz,et al.  Timing-Driven Titan: Enabling Large Benchmarks and Exploring the Gap between Academic and Commercial CAD , 2015, TRETS.

[8]  Guojie Luo,et al.  Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion , 2017, FPGA.

[9]  Akash Kumar,et al.  ParaDiMe: A Distributed Memory FPGA Router Based on Speculative Parallelism and Path Encoding , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[10]  Sarita V. Adve,et al.  Parallel programming must be deterministic by default , 2009 .

[11]  Hari Angepat,et al.  A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Marcel Gort,et al.  Deterministic multi-core parallel routing for FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.

[13]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[14]  T. Knight,et al.  Pathfinder : A Negotiation-Based Performance-Driven Router for FPGAs , 2012 .

[15]  S. Yang,et al.  Logic Synthesis and Optimization Benchmarks User Guide Version 3.0 , 1991 .

[16]  Yajun Ha,et al.  ParaLaR: A parallel FPGA router based on Lagrangian relaxation , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[17]  Kurt Keutzer,et al.  Parallelizing CAD: A timely research agenda for EDA , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[18]  C. Y. Lee An Algorithm for Path Connections and Its Applications , 1961, IRE Trans. Electron. Comput..

[19]  Carl Ebeling,et al.  Distributed-memory parallel routing for field-programmable gatearrays , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Sen Wang,et al.  VTR 7.0: Next Generation Architecture and CAD System for FPGAs , 2014, TRETS.

[21]  Nelson Maculan,et al.  TDR: A Distributed-Memory Parallel Routing Algorithm for FPGAs , 2002, FPL.

[22]  Scott Hauck,et al.  Runtime and quality tradeoffs in FPGA placement and routing , 2001, FPGA '01.