Parallel multi-level analytical global placement on graphics processing units

GPU platforms are becoming increasingly attractive for implementing accelerators because they feature a larger number of cores with improved programmability. In this paper, we describe our implementation of a state-of-the-art academic multi-level analytical placer mPL on Nvidia's massively parallel GT200 series platforms. We detail our efforts on performance tuning and optimizations. When compared to software implementation on Intel's recent generation Xeon CPU, the speed of the global placement part of mPL is 15× faster on average using a Tesla C1060 card, with comparable WL. (less than 1% WL degradation on average).

[1]  Leee/acm International Conference On Computer-aided Design Digest Of Technical Papers , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[2]  Chris C. N. Chu,et al.  FastPlace: efficient analytical placement using cell shifting, iterative local refinement,and a hybrid net model , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  A. H. Stroud,et al.  Methods of Numerical Integration—Second Edition (Philip J. Davis and Philip Rabinowitz) , 1986 .

[4]  J. Makhoul A fast cosine transform in one and two dimensions , 1980 .

[5]  Vaughn Betz,et al.  High-quality, deterministic parallel placement for FPGAs on commodity hardware , 2008, FPGA '08.

[6]  Kurt Keutzer,et al.  Parallelizing CAD: A timely research agenda for EDA , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[7]  P. Carr High Quality , 2011, IEEE Solid-State Circuits Magazine.

[8]  Jason Cong Timing closure based on physical hierarchy , 2002, ISPD '02.

[9]  Majid Sarrafzadeh,et al.  Dragon2000: standard-cell placement tool for large industry circuits , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[10]  John A. Chandy,et al.  A Parallel Circuit-Partitioned Algorithm for Timing-Driven Standard Cell Placement , 1999, J. Parallel Distributed Comput..

[11]  Jason Cong,et al.  Multilevel generalized force-directed method for circuit placement , 2005, ISPD '05.

[12]  Andrew B. Kahng,et al.  Can recursive bisection alone produce routable, placements? , 2000, Proceedings 37th Design Automation Conference.

[13]  Guilherme Flach,et al.  Cell placement on graphics processing units , 2007, SBCCI '07.

[14]  Jarrod A. Roy,et al.  Unification of partitioning, placement and floorplanning , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[15]  Andrew B. Kahng,et al.  APlace: a general analytic placement framework , 2005, ISPD '05.

[16]  Alberto L. Sangiovanni-Vincentelli,et al.  TimberWolf3.2: A New Standard Cell Placement and Global Routing Package , 1986, 23rd ACM/IEEE Design Automation Conference.

[17]  Jason Cong,et al.  Modern Circuit Placement, Best Practices and Results , 2007 .

[18]  Frank M. Johannes,et al.  Generic global placement and floorplanning , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[19]  Jason Cong,et al.  Highly Efficient Gradient Computation for Density-Constrained Analytical Placement , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.