Depth-bounded Graph Partitioning Algorithm and Dual Clocking Method for Realization of Superconducting SFQ Circuits

Superconducting Single Flux Quantum (SFQ) logic with switching delay of 1ps and switching energy of 10−19J is a potential emerging candidate for replacing Complementary Metal Oxide Semiconductor (CMOS) to achieve very high speed and ultra energy efficiency. Conventional SFQ circuits need Full Path Balancing (FPB), which tends to require insertion of many path balancing buffers (D-Flip-Flops). FPB method increases total power consumption as well as total area of the chip. This article presents a novel scheme for realization of superconducting SFQ circuits by introducing a new depth-bounded graph partitioning algorithm in combination with a dual clocking method (slow and fast clock pulses) that minimizes the aforesaid path balancing overheads. Experimental results show that the proposed solution reduces total number of path balancing buffers and total static power consumption by an average of 2.68× and 60%, respectively, when compared to the best of other methods for realizing SFQ circuits. However, our scheme degrades the peak throughput; therefore, it is especially valuable when the actual throughput of the SFQ circuit is much lower than the peak theoretical throughput. This is typically the case due to high-level data dependencies of the application that feeds data into an SFQ circuit.

[1]  O A Mukhanov,et al.  Energy-Efficient Single Flux Quantum Technology , 2011, IEEE Transactions on Applied Superconductivity.

[2]  Andrew B. Kahng,et al.  Multilevel circuit partitioning , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[3]  K. Sun,et al.  A two-phase method based on OBDD for searching for splitting strategies of large-scale power systems , 2002, Proceedings. International Conference on Power System Technology.

[4]  Eby G. Friedman,et al.  Clock distribution networks in synchronous digital integrated circuits , 2001, Proc. IEEE.

[5]  Rajmohan Rajaraman,et al.  Optimum clustering for delay minimization , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Kevin J. Nowka,et al.  Power gating with multiple sleep modes , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[7]  Shahin Nazarian,et al.  Deep-PowerX: a deep learning-based framework for low-power approximate logic synthesis , 2020, ISLPED.

[8]  Anna Y. Herr,et al.  Ultra-low-power superconductor logic , 2011, 1103.4269.

[9]  Hugo Bender,et al.  Copper Plating for 3D Interconnects , 2010, ECS Transactions.

[10]  O. Mukhanov,et al.  Ultimate performance of the RSFQ logic circuits , 1987 .

[11]  Chung-Kuan Cheng,et al.  Tutorial on VLSI Partitioning , 2000, VLSI Design.

[12]  Harry R. Lewis Review: Michael R. Garey, David S. Johnson, Computers and Intractability. A Guide to the Theory of NP-Completeness , 1983 .

[13]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Massoud Pedram,et al.  Power Aware Design Methodologies , 2002 .

[15]  Klaudia Frankfurter Computers And Intractability A Guide To The Theory Of Np Completeness , 2016 .

[16]  Alireza Shafaei,et al.  Design of Complex Rapid Single-Flux-Quantum Cells with Application to Logic Synthesis , 2017, 2017 16th International Superconductive Electronics Conference (ISEC).

[17]  Peter Sanders,et al.  Recent Advances in Graph Partitioning , 2013, Algorithm Engineering.

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[19]  Massoud Pedram,et al.  An Efficient Pipelined Architecture for Superconducting Single Flux Quantum Logic Circuits Utilizing Dual Clocks , 2020, IEEE Transactions on Applied Superconductivity.

[20]  Eugene L. Lawler,et al.  Module Clustering to Minimize Delay in Digital Networks , 1969, IEEE Transactions on Computers.

[21]  Rudolf Gross,et al.  Applied Superconductivity : Josephson Effect and Superconducting Electronics , 2009 .

[22]  Massoud Pedram,et al.  Balanced Factorization and Rewriting Algorithms for Synthesizing Single Flux Quantum Logic Circuits , 2019, ACM Great Lakes Symposium on VLSI.

[23]  Cecilia R. Aragon,et al.  Optimization by Simulated Annealing: An Experimental Evaluation; Part I, Graph Partitioning , 1989, Oper. Res..

[24]  Massoud Pedram,et al.  ThermTap: An online power analyzer and thermal simulator for Android devices , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[25]  D. S. Holmes,et al.  Energy-Efficient Superconducting Computing—Power Budgets and Requirements , 2013, IEEE Transactions on Applied Superconductivity.

[26]  Peter A. Beerel,et al.  A Robust and Self-Adaptive Clocking Technique for SFQ Circuits , 2018, IEEE Transactions on Applied Superconductivity.

[27]  Alireza Shafaei,et al.  SFQmap: A Technology Mapping Tool for Single Flux Quantum Logic Circuits , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[28]  John P. Hayes,et al.  Unveiling the ISCAS-85 Benchmarks: A Case Study in Reverse Engineering , 1999, IEEE Des. Test Comput..

[29]  Martin D. F. Wong,et al.  Circuit partitioning with complex resource constraints in FPGAs , 1998, FPGA '98.

[30]  Sied Mehdi Fakhraie,et al.  A 256-kb 9T Near-Threshold SRAM With 1k Cells per Bitline and Enhanced Write and Read Operations , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[31]  Naoki Takeuchi,et al.  Energy efficiency of adiabatic superconductor logic , 2014 .

[32]  V. Semenov,et al.  RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems , 1991, IEEE Transactions on Applied Superconductivity.

[33]  Peter A. Beerel,et al.  A Robust and Tree-Free Hybrid Clocking Technique for RSFQ Circuits - CSR Application , 2017, 2017 16th International Superconductive Electronics Conference (ISEC).

[34]  David S. Johnson,et al.  Some simplified NP-complete problems , 1974, STOC '74.

[35]  K. Likharev,et al.  Rapid single flux quantum T-flip flop operating up to 770 GHz , 1999, IEEE Transactions on Applied Superconductivity.

[36]  Giovanni De Micheli,et al.  The EPFL Combinational Benchmark Suite , 2015 .

[37]  Massoud Pedram,et al.  A Graph Partitioning Algorithm with Application in Synthesizing Single Flux Quantum Logic Circuits , 2018, ArXiv.

[38]  Gary Smith Updates of the ITRS design cost and power models , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[39]  Massoud Pedram,et al.  PBMap: A Path Balancing Technology Mapping Algorithm for Single Flux Quantum Logic Circuits , 2018, IEEE Transactions on Applied Superconductivity.

[40]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[41]  David Blaauw,et al.  Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits , 2010, Proceedings of the IEEE.

[42]  Martin D. F. Wong,et al.  Network-flow-based multiway partitioning with area and pin constraints , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[43]  Massoud Pedram,et al.  Performance prediction for multiple-threshold 7nm-FinFET-based circuits operating in multiple voltage regimes using a cross-layer simulation framework , 2014, 2014 SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S).

[44]  Andrew B. Kahng,et al.  Recent directions in netlist partitioning: a survey , 1995, Integr..

[45]  V. Semenov,et al.  Transmission of single-flux-quantum pulses along superconducting microstrip lines , 1993, IEEE Transactions on Applied Superconductivity.

[46]  Saibal Mukhopadhyay,et al.  Switching Energy in CMOS Logic: How far are we from physical limit? , 2006 .

[47]  N. Collaert,et al.  Review of FINFET technology , 2009, 2009 IEEE International SOI Conference.

[48]  Mihalis Yannakakis,et al.  Doubly Balanced Connected Graph Partitioning , 2016, SODA.

[49]  Gaetano Borriello,et al.  An evaluation of bipartitioning techniques , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[50]  Li-Shiuan Peh,et al.  Software-directed power-aware interconnection networks , 2007, ACM Trans. Archit. Code Optim..

[51]  Nobutaka Kito,et al.  A Fast Wire-Routing Method and an Automatic Layout Tool for RSFQ Digital Circuits Considering Wire-Length Matching , 2018, IEEE Transactions on Applied Superconductivity.

[52]  Konstantin K. Likharev,et al.  Experimental realization of a resistive single flux quantum logic circuit , 1987 .

[53]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[54]  Joachim von Buttlar,et al.  IBM z13 firmware innovations for simultaneous multithreading and I/O virtualization , 2015, IBM J. Res. Dev..

[55]  Ann B. Lee,et al.  Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Joseph Naor,et al.  Fast approximate graph partitioning algorithms , 1997, SODA '97.

[57]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[58]  Peter Sanders,et al.  Parallel Graph Partitioning for Complex Networks , 2017, IEEE Transactions on Parallel and Distributed Systems.

[59]  Massoud Pedram,et al.  A Dynamic Programming-Based, Path Balancing Technology Mapping Algorithm Targeting Area Minimization , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[60]  Naoki Takeuchi,et al.  An adiabatic quantum flux parametron as an ultra-low-power logic device , 2013 .

[61]  William W. Hager,et al.  An exact algorithm for graph partitioning , 2013, Math. Program..

[62]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[63]  George Karypis,et al.  A Parallel Hill-Climbing Refinement Algorithm for Graph Partitioning , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[64]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[65]  Konstantin K. Likharev,et al.  Resistive Single Flux Quantum Logic for the Josephson- Junction Digital Technology , 2011 .

[66]  V. Semenov,et al.  A design approach to passive interconnects for single flux quantum logic circuits , 2003 .

[67]  Dorit S. Hochbaum,et al.  A Polynomial Algorithm for the k-cut Problem for Fixed k , 1994, Math. Oper. Res..