ABCDPlace: Accelerated Batch-Based Concurrent Detailed Placement on Multithreaded CPUs and GPUs

Placement is an important step in modern very-large-scale integrated (VLSI) designs. Detailed placement is a placement refining procedure intensively called throughout the design flow, thus its efficiency has a vital impact on design closure. However, since most detailed placement techniques are inherently greedy and sequential, they are generally difficult to parallelize. In this article, we present a concurrent detailed placement framework, ABCDPlace, exploiting multithreading and graphic processing unit (GPU) acceleration. We propose batch-based concurrent algorithms for widely adopted sequential detailed placement techniques, such as independent set matching, global swap, and local reordering. The experimental results demonstrate that <italic>ABCDPlace</italic> can achieve <inline-formula> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula>–<inline-formula> <tex-math notation="LaTeX">$5\times $ </tex-math></inline-formula> faster runtime than sequential implementations with multithreaded CPU and over <inline-formula> <tex-math notation="LaTeX">$10\times $ </tex-math></inline-formula> with GPU on ISPD 2005 contest benchmarks without quality degradation. On larger industrial benchmarks, we show more than <inline-formula> <tex-math notation="LaTeX">$16\times $ </tex-math></inline-formula> speedup with GPU over the state-of-the-art sequential detailed placer. ABCDPlace finishes the detailed placement of a 10-million-cell industrial design in 1 min.

[1]  Jin Hu,et al.  Progress and Challenges in VLSI Placement Research , 2012, Proceedings of the IEEE.

[2]  David Z. Pan,et al.  Triple Patterning Aware Detailed Placement Toward Zero Cross-Row Middle-of-Line Conflict , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Chris C. N. Chu,et al.  An efficient and effective detailed placement algorithm , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[4]  Yih-Lang Li,et al.  NCTU-GR: Efficient Simulated Evolution-Based Rerouting and Congestion-Relaxed Layer Assignment on 3-D Global Routing , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Yao-Wen Chang,et al.  Analytical Solution of Poisson's Equation and Its Application to VLSI Global Placement , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[6]  Yao-Wen Chang,et al.  Generalized Augmented Lagrangian and Its Applications to VLSI Global Placement* , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[7]  Yao-Wen Chang,et al.  NTUplace3: An Analytical Placer for Large-Scale Mixed-Size Designs With Preplaced Blocks and Density Constraints , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Martin D. F. Wong,et al.  Cpp-Taskflow: Fast Task-Based Parallel Programming Using Modern C++ , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[9]  Yao-Wen Chang,et al.  Big: A Bivariate Gradient-Based Wirelength Model for Analytical Circuit Placement , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[10]  John D. Owens,et al.  Gunrock , 2017, ACM Trans. Parallel Comput..

[11]  Yao-Wen Chang,et al.  NTUplace4h: A Novel Routability-Driven Placement Algorithm for Hierarchical Mixed-Size Circuit Designs , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Dimitri P. Bertsekas,et al.  A new algorithm for the assignment problem , 1981, Math. Program..

[13]  Andrew B. Kahng,et al.  RePlAce: Advancing Solution Quality and Routability Validation in Global Placement , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Yih-Lang Li,et al.  Density-aware detailed placement with instant legalization , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[16]  Guy E. Blelloch,et al.  Greedy sequential maximal independent set and matching are parallel on average , 2012, SPAA '12.

[17]  Keshav Pingali,et al.  Can Parallel Programming Revolutionize EDA Tools? , 2018, Advanced Logic Synthesis.

[18]  Jianwen Zhu,et al.  Parallelizing Simulated Annealing-Based Placement Using GPGPU , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[19]  Chris C. N. Chu,et al.  FastPlace 3.0: A Fast Multilevel Quadratic Placement Algorithm with Placement Congestion Control , 2007, 2007 Asia and South Pacific Design Automation Conference.

[20]  Tao Lin,et al.  TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints , 2015, ISPD.

[21]  John A. Chandy,et al.  Parallel simulated annealing strategies for VLSI cell placement , 1996, Proceedings of 9th International Conference on VLSI Design.

[22]  Pinaki Mazumder,et al.  VLSI cell placement techniques , 1991, CSUR.

[23]  George J. Pappas,et al.  A distributed auction algorithm for the assignment problem , 2008, 2008 47th IEEE Conference on Decision and Control.

[24]  Andrew B. Kahng,et al.  Scalable detailed placement legalization for complex sub-14nm constraints , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[25]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[26]  Yao-Wen Chang,et al.  NTUplace: a ratio partitioning based placement algorithm for large-scale mixed-size designs , 2005, ISPD '05.

[27]  Meng Li,et al.  UTPlaceF 3.0: A parallelization framework for modern FPGA global placement: (Invited paper) , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[28]  Jude Arun Selvan Jesuthasan Incremental Timing-Driven Placement with Displacement Constraint , 2015 .

[29]  Tung-Chieh Chen,et al.  Challenges and solutions in modern analog placement , 2007, Proceedings of Technical Program of 2012 VLSI Design, Automation and Test.

[30]  Gary William Grewal,et al.  A scalable, serially-equivalent, high-quality parallel placement methodology suitable for modern multicore and GPU architectures , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[31]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[32]  Tao Lin,et al.  POLAR 3.0: An ultrafast global placement engine , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[33]  Evangeline F. Y. Young,et al.  Cell density-driven detailed placement with displacement constraint , 2014, ISPD '14.

[34]  David Z. Pan,et al.  MrDP: Multiple-row detailed placement of heterogeneous-sized cells for advanced nodes , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[35]  Tung-Chieh Chen,et al.  Challenges and Solutions in Modern VLSI Placement , 2007, 2007 International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[36]  Gi-Joon Nam,et al.  The ISPD2005 placement contest and benchmark suite , 2005, ISPD '05.

[37]  Fanica Gavril,et al.  Algorithms for Minimum Coloring, Maximum Clique, Minimum Covering by Cliques, and Maximum Independent Set of a Chordal Graph , 1972, SIAM J. Comput..

[38]  David Z. Pan,et al.  DREAMPIace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[39]  Andrew B. Kahng,et al.  On legalization of row-based placements , 2004, GLSVLSI '04.

[40]  Chris C. N. Chu,et al.  FastPlace: efficient analytical placement using cell shifting, iterative local refinement,and a hybrid net model , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[41]  Yao-Wen Chang,et al.  Mixed-Cell-Height Placement with Complex Minimum-Implant-Area Constraints , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[42]  D. Chinnery,et al.  ISPD 2015 Benchmarks with Fence Regions and Routing Blockages for Detailed-Routing-Driven Placement , 2015, ISPD.

[43]  Ismail Bustany,et al.  NTUplace4dr: A Detailed-Routing-Driven Placer for Mixed-Size Circuit Designs With Technology and Region Constraints , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[44]  Jiaqi Gu,et al.  DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement , 2020 .

[45]  David Z. Pan,et al.  GDP: GPU accelerated Detailed Placement , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[46]  Chris C. N. Chu,et al.  Detailed Placement Algorithm for VLSI Design With Double-Row Height Standard Cells , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[47]  Jason Cong,et al.  Parallel multi-level analytical global placement on graphics processing units , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.