Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion

FPGAs are increasingly popular as application-specific accelerators because they lead to a good balance between flexibility and energy efficiency, compared to CPUs and ASICs. However, the long routing time imposes a barrier on FPGA computing, which significantly hinders the design productivity. Existing attempts of parallelizing the FPGA routing either do not fully exploit the parallelism or suffer from an excessive quality loss. Massive parallelism using GPUs has the potential to solve this issue but faces non-trivial challenges. To cope with these challenges, this work presents Corolla, a GPU-accelerated FPGA routing method. Corolla enables applying the GPU-friendly shortest path algorithm in FPGA routing, leveraging the idea of problem size reduction by limiting the search in routing subgraphs. We maintain the convergence after problem size reduction using the dynamic expansion of the routing resource subgraphs. In addition, Corolla explores the fine-grained single-net parallelism and proposes a hybrid approach to combine the static and dynamic parallelism on GPU. To explore the coarse-grained multi-net parallelism, Corolla proposes an effective method to parallelize mutli-net routing while preserving the equivalent routing results as the original single-net routing. Experimental results show that Corolla achieves an average of 18.72x speedup on GPU with a tolerable loss in the routing quality and sustains a scalable speedup on large-scale routing graphs. To our knowledge, this is the first work to demonstrate the effectiveness of GPU-accelerated FPGA routing.

[1]  Jian Wang,et al.  A novel net-partition-based multithread FPGA routing method , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[2]  John Wawrzynek,et al.  Hardware-assisted fast routing , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[3]  Raphael Rubin,et al.  Timing-driven pathfinder pathology and remediation: quantifying and reducing delay noise in VPR-pathfinder , 2011, FPGA '11.

[4]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[5]  Ulrich Meyer,et al.  Delta-Stepping: A Parallel Single Source Shortest Path Algorithm , 1998, ESA.

[6]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[7]  Carl Ebeling,et al.  PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.

[8]  Avi Bleiweiss,et al.  GPU accelerated pathfinding , 2008, GH '08.

[9]  Viktor K. Prasanna,et al.  Domain Specific Mapping for Solving Graph Problems on Reconfigurable Devices , 1999, IPPS/SPDP Workshops.

[10]  Shuai Mu,et al.  Electronic Design Automation with Graphic Processors: A Survey , 2013, Found. Trends Electron. Des. Autom..

[11]  Viktor K. Prasanna,et al.  Accelerating Large-Scale Single-Source Shortest Path on FPGA , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[12]  Michael Garland,et al.  Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[13]  Nicola Bombieri,et al.  An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures , 2016, IEEE Transactions on Parallel and Distributed Systems.

[14]  Vaughn Betz,et al.  Titan: Enabling large and complex benchmarks in academic CAD , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[15]  Sen Wang,et al.  VTR 7.0: Next Generation Architecture and CAD System for FPGAs , 2014, TRETS.

[16]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[17]  Martine D. F. Schlag,et al.  Acceleration of an FPGA router , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[18]  Yajun Ha,et al.  ParaLaR: A parallel FPGA router based on Lagrangian relaxation , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[19]  Nachiket Kapre,et al.  GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[20]  Doris Chen,et al.  Parallelizing FPGA Technology Mapping Using Graphics Processing Units (GPUs) , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[21]  Kurt Keutzer,et al.  Parallelizing CAD: A timely research agenda for EDA , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[22]  C. Y. Lee An Algorithm for Path Connections and Its Applications , 1961, IRE Trans. Electron. Comput..

[23]  Carl Ebeling,et al.  Distributed-memory parallel routing for field-programmable gatearrays , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[24]  Marcel Gort,et al.  Deterministic multi-core parallel routing for FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.

[25]  Keshav Pingali,et al.  The tao of parallelism in algorithms , 2011, PLDI '11.

[26]  Guojie Luo,et al.  Accelerate FPGA routing with parallel recursive partitioning , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[27]  Sunil P. Khatri,et al.  Introduction to GPU programming for EDA , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[28]  Nachiket Kapre,et al.  GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths , 2016, FPGA.

[29]  Philip Brisk,et al.  Parallel FPGA routing based on the operator formulation , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[30]  Christian Fobel,et al.  GPU-Accelerated Wire-Length Estimation for FPGA Placement , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[31]  Sanghamitra Roy,et al.  A global router on GPU architecture , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[32]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[33]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).