Deterministic Timing-Driven Parallel Placement by Simulated Annealing Using Half-Box Window Decomposition

As each generation of FPGAs grow in size, the run time of the associated CAD tools is rapidly increasing. Many past efforts have aimed at improving the CAD run time through parallelization of the placement algorithm. Wang and Lemieux presented an algorithm that is scalable, deterministic, timing-driven and achieves speedup over VPR [Wang and Lemieux FPGA'11]. This paper provides two significant alterations to Wang and Lemieux's algorithm, resulting in additional speedup and quality improvement. The first contribution is a new data decomposition scheme, called the half-box window technique, which achieves speedup by reducing the frequency of thread synchronization. The second contribution is the development of an improved annealing schedule, which further improves run time and slightly improves the quality of results. Together, these modifications achieve run time speedups of up to 70%. To put this in perspective, Wang and Lemieux required 25 threads to achieve best speedup, while this work requires only 16 threads. For a 10% degradation in quality, the new 16-thread algorithm achieves a 51x speedup over VPR, compared to a 35x speedup by the 25-thread original algorithm. Regarding quality, the best quality of results achieved by the new algorithm is a 5% degradation versus VPR, compared to a 8% degradation of the original Wang and Lemieux algorithm.

[1]  Steven J. E. Wilton,et al.  Towards scalable FPGA CAD through architecture , 2011, FPGA '11.

[2]  Carl Sechen,et al.  A loosely coupled parallel algorithm for standard cell placement , 1994, ICCAD '94.

[3]  André DeHon,et al.  Hardware-assisted simulated annealing with application for fast FPGA placement , 2003, FPGA '03.

[4]  Russell Tessier Fast placement approaches for FPGAs , 2002, TODE.

[5]  Rob A. Rutenbar,et al.  Placement by Simulated Annealing on a Multiprocessor , 1987, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Prithviraj Banerjee,et al.  Parallel Simulated Annealing Algorithms for Cell Placement on Hypercube Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[7]  Vaughn Betz,et al.  Architecture and CAD for Deep-Submicron FPGAS , 1999, The Springer International Series in Engineering and Computer Science.

[8]  Alok N. Choudhary,et al.  Parallel algorithms for FPGA placement , 2000, ACM Great Lakes Symposium on VLSI.

[9]  G. Lemieux,et al.  Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[10]  Guy Lemieux,et al.  Scalable and deterministic timing-driven parallel placement for FPGAs , 2011, FPGA '11.

[11]  Andrew A. Kennings,et al.  Improving Simulated Annealing-Based FPGA Placement With Directed Moves , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Jianwen Zhu,et al.  Parallelizing Simulated Annealing-Based Placement Using GPGPU , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[13]  Vaughn Betz,et al.  High-quality, deterministic parallel placement for FPGAs on commodity hardware , 2008, FPGA '08.

[14]  Michael Santarini,et al.  Xilinx Tailors Four Tool Flows to Customer Design Disciplines in ISE Design Suite , 2009 .

[15]  Kenneth B. Kent,et al.  VPR 5.0: FPGA CAD and architecture exploration tools with single-driver routing, heterogeneity and process scaling , 2011, TRETS.