Low latency high throughput memory-processor interface

Scaling to ExaFLOPS computing, or 100 times faster than the present version of the Fujitsu K-supercomputer, presents well known challenges, among which are power dissipation, memory capacity and access bandwidth, data locality and fault tolerance. The optimum Amdahl's speed-up strategy is multi faceted, with greater memory bandwidth and lower access latency being generally recognized as areas to improve. To this end, evolutionary compute node architecture is considered based on a multichip interposer platform and a millimeter wave memory interface. The interposer serves as the compute node physical platform and wiring distribution layer connecting the chip multiprocessor (CMP) with on-interposer memory to an organic board. For example, the interposer may be composed of glass to reduce through-via parasitic and support one multi-GFLOPS CMP with sufficient on-interposer DRAM for balanced operation. The memory interface consists of dense arrays of millimeter waveguide with integrated mm wave transceivers and should support 40 Gb/s per channel for an aggregate throughput of 1 TB/s with estimated latency of 10-15 clock cycles. This paper examines channel impediments, design and construction. Data transmission on a 72 GHz carrier frequency and 12 Gb/s OOK modulation will be presented at the conference if available.

[1]  Torsten Hoefler,et al.  The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[2]  T. A. Abele,et al.  A High-Capacity Digital Communication System Using TE/sub 01/ Transmission in Circular Waveguide , 1975 .

[3]  Hua Chi,et al.  Millimeter wave complex refractive index, complex dielectric permittivity and loss tangent of extra high purity and compensated silicon , 1994 .

[4]  Shu-Hao Fan,et al.  Toward a 60‐GHz wireless, low‐power, high‐throughput memory access system , 2009 .

[5]  H. Schnitger,et al.  Circular Waveguide System for Trunk Communications , 1974, IEEE Trans. Commun..

[6]  Onur Mutlu,et al.  Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance , 2006, IEEE Micro.

[7]  K. Okamoto Fundamentals of Optical Waveguides , 2000 .

[8]  Gee-Kung Chang,et al.  Millimeter-wave main memory-to-processor data bus , 2010, 2010 11th International Conference on Electronic Packaging Technology & High Density Packaging.

[9]  David A. B. Miller,et al.  Limit to the Bit-Rate Capacity of Electrical Interconnects from the Aspect Ratio of the System Architecture , 1997, J. Parallel Distributed Comput..

[10]  De-feng Liu,et al.  Understanding how memory-level parallelism affects the processors performance , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[11]  Mohammed N. Afsar,et al.  Millimeter wave complex refractive index, complex dielectric permittivity and loss tangent of high purity and compensated silicon , 1990, Conference on Precision Electromagnetic Measurements.

[12]  C. M. Knop,et al.  Further Comments on "Pulse Waveform Degradation Due to Dispersion in Waveguide" (Correspondence) , 1963 .

[13]  Hsien-Hsin S. Lee,et al.  An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[14]  Ting Wu,et al.  A 16Gb/s/link, 64GB/s bidirectional asymmetric memory interface cell , 2008, 2008 IEEE Symposium on VLSI Circuits.