COOL-0: Design of an RSFQ subsystem for petaflops computing

We discuss a preliminary design of a Rapid Single-Flux-Quantum (RSFQ) subsystem for general-purpose computers with petaflops-scale performance. The subsystem is being developed at Stony Brook within the framework of the Hybrid Technology MultiThreading (HTMT) project. COOL-0 design is based on 0.8-/spl mu/m RSFQ technology which enables the implementation of superconductor processing elements (SPELLs) operating at clock frequencies up to 100 GHz pipelined cryo-memory (CRAM) with 30 ps cycle time and interprocessor network (CNET) with a bandwidth of 30 Gbps per channel. The main architectural challenge is an almost 1,000-fold speed difference between the RSFQ processors and room-temperature SRAM comprising the second level of the HTMT memory hierarchy. The proposed solution to the problem is hardware support for two-level multithreading and block transfer techniques in SPELLs. Our preliminary estimates show that an RSFQ subsystem with 4 K SPELLs and a 4-Gbyte CRAM may be sufficient to achieve the performance close to 0.5 petaflops for computationally intensive program kernels. COOL-0 would occupy a physical space of about 0.5 m/sup 3/ and dissipate power as low as 250 Watts (at helium temperature). These numbers present a dramatic improvement compared to a hypothetical purely-semiconductor petaflops-scale computer.

[1]  Dmitri V. Averin,et al.  Electron transport in mesoscopic disordered superconductor-normal-metal-superconductor junctions , 1997 .

[2]  V. Semenov,et al.  RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems , 1991, IEEE Transactions on Applied Superconductivity.

[3]  Averin,et al.  ac Josephson Effect in a Single Quantum Channel. , 1995, Physical review letters.

[4]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[5]  K. Likharev,et al.  Rapid single flux quantum T-flip flop operating up to 770 GHz , 1999, IEEE Transactions on Applied Superconductivity.

[6]  A.W. Kleinsasser,et al.  Effect of growth conditions on the electrical properties of Nb/Al-oxide/Nb tunnel junctions , 1995, IEEE Transactions on Applied Superconductivity.

[7]  Guang R. Gao,et al.  Steps to Petaflops computing: a hybrid technology multithreaded architecture , 1997, 1997 IEEE Aerospace Conference.

[8]  D.Yu. Zinoviev,et al.  CNET: design of an RSFQ switching network for petaflops-scale computing , 1999, IEEE Transactions on Applied Superconductivity.

[9]  Konstantin K. Likharev,et al.  Superconductors speed up computation , 1997 .

[10]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[11]  M. Leung,et al.  Manufacturability of superconductor electronics for a petaflops-scale computer , 1999, IEEE Transactions on Applied Superconductivity.

[12]  S. Tahara,et al.  A 380 ps, 9.5 mW Josephson 4-Kbit RAM operated at a high bit yield , 1995, IEEE Transactions on Applied Superconductivity.

[13]  Guang R. Gao,et al.  Hybrid technology multithreaded architecture , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[14]  Kozo Kimura,et al.  An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.

[15]  Y. Kameda,et al.  Self-timed parallel adders based on DI RSFQ primitives , 1999, IEEE Transactions on Applied Superconductivity.

[16]  P. Bunyk,et al.  Case study in RSFQ design: fast pipelined parallel adder , 1999, IEEE Transactions on Applied Superconductivity.

[17]  K. Likharev,et al.  Pulse jitter and timing errors in RSFQ circuits , 1999, IEEE Transactions on Applied Superconductivity.

[18]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.