论文信息 - Tatum: Parallel Timing Analysis for Faster Design Cycles and Improved Optimization

Tatum: Parallel Timing Analysis for Faster Design Cycles and Improved Optimization

Static Timing Analysis (STA) is used to evaluate the correctness and performance of a digital circuit implementation. In addition to final sign-off checks, STA is called numerous times during placement and routing to guide optimization. As a result, STA consumes a significant fraction of the time required for design implementation; to make progress reducing FPGA compile times we need faster STA. We evaluate the suitability of both GPU and multi-core CPU platforms for accelerating STA. On core STA algorithms our GPU kernel achieves a 6.2 times kernel speed-up but data transfer overhead reduces this to 0.9 times. Our best CPU implementation achieves a 9.2 times parallel speed-up on 32 cores, yielding a 15.2 times overall speed-up compared to the VPR analyzer, and a 6.9 times larger parallel speed-up than a recent parallel ASIC timing analyzer. We then show how reducing the run-time cost of STA can be leveraged to improve optimization quality, reducing critical path delay by 4%.

Vaughn Betz | Kevin E. Murray

[1] Guojie Luo,et al. Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion , 2017, FPGA.

[2] Vaughn Betz,et al. VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[3] David Blaauw,et al. Efficient Monte Carlo based incremental statistical timing analysis , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[4] Carl Ebeling,et al. Stratix™ 10 High Performance Routable Clock Networks , 2016, FPGA.

[5] Andrew B. Kahng,et al. New game, new goal posts: A recent history of timing closure , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[6] Mirjana Stojilović,et al. Parallel FPGA routing: Survey and challenges , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[7] Sen Wang,et al. VTR 7.0: Next Generation Architecture and CAD System for FPGAs , 2014, TRETS.

[8] Mike Hutton,et al. Efficient static timing analysis and applications using edge masks , 2005, FPGA '05.

[9] Sunil P. Khatri,et al. Accelerating statistical static timing analysis using graphics processing units , 2009, 2009 Asia and South Pacific Design Automation Conference.

[10] Vaughn Betz,et al. Timing-driven placement for FPGAs , 2000, FPGA '00.

[11] Yajun Ha,et al. ParaLaR: A parallel FPGA router based on Lagrangian relaxation , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[12] Arch D. Robison,et al. Structured Parallel Programming: Patterns for Efficient Computation , 2012 .

[13] Jin-Hee Cho,et al. Trust-Based Multi-objective Optimization for Node-to-Task Assignment in Coalition Networks , 2013, FCCM 2013.

[14] David M. Lewis,et al. Architectural enhancements in Stratix V™ , 2013, FPGA '13.

[15] Gary William Grewal,et al. A scalable, serially-equivalent, high-quality parallel placement methodology suitable for modern multicore and GPU architectures , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[16] Charles H.-P. Wen,et al. CASTA: CUDA-Accelerated Static Timing Analysis for VLSI Designs , 2014, 2014 43rd International Conference on Parallel Processing.

[17] Marcel Gort,et al. Deterministic multi-core parallel routing for FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.

[18] Rakesh Chadha,et al. Static Timing Analysis for Nanometer Designs: A Practical Approach , 2009 .

[19] Yangdong Deng,et al. Taming irregular EDA applications on GPUs , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[20] Vaughn Betz,et al. Timing-Driven Titan: Enabling Large Benchmarks and Exploring the Gap between Academic and Commercial CAD , 2015, TRETS.

[21] Christopher D. Carothers,et al. Prototype for a large-scale static timing analyzer running on an IBM Blue Gene , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[22] Steven Trimberger,et al. Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology , 2015, Proceedings of the IEEE.

[23] Jason H. Anderson,et al. Compact Area and Performance Modelling for CGRA Architecture Evaluation , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[24] Valavan Manohararajah,et al. The Stratix™ 10 Highly Pipelined FPGA Architecture , 2016, FPGA.

[25] Vaughn Betz,et al. Slack Allocation and Routing to Improve FPGA Timing While Repairing Short-Path Violations , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26] Vaughn Betz,et al. Speeding Up FPGA Placement: Parallel Algorithms and Methods , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[27] Guy Lemieux,et al. Deterministic Timing-Driven Parallel Placement by Simulated Annealing Using Half-Box Window Decomposition , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[28] Jiang Hu,et al. GPU acceleration for PCA-based statistical static timing analysis , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[29] David Blaauw,et al. Statistical Timing Analysis: From Basic Principles to State of the Art , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30] Scott Hauck,et al. Enhancing timing-driven FPGA placement for pipelined netlists , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[31] Vaughn Betz,et al. Efficient and Deterministic Parallel Placement for FPGAs , 2011, TODE.

[32] Martin D. F. Wong,et al. OpenTimer: A high-performance timing analysis tool , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).