Low latency compute node architecture cooled by a two phase fluid flow

As high performance multi-core scalar CPU and vector GPU processors approach 256 GFLOPSof processing power, transport latency and bandwidth (BW) between on-board DRAM and processor become a substantial bottleneck to optimal system performance. This is, in large part, because board level, data transport occurs over legacy L-C transmission lines having limited BW over a limited distance. As a consequence, high performance, systems running memory intensive applications are able to utilize only a fraction of their available computational potential and remain idle for many clock cycles while waiting for data and instructions. A number of alternate short range transport technologies are listed in the International Technology Roadmap for Semiconductors, among which the most promising is inter-chip optical communication. This paper proposes scalable, guided millimeter wave inter-chip communication with high speed I/Os on a common co-planar wiring net to reduce latency. Advantage is taken of high order digital M-QAM modulation to scale the spectral efficiency of carrier waves coding. Design advantage is offered by 3D DRAM stacking to achieve DRAM volume and 3D interposer stacking to achieve high I/O count and wiring escape BW. The compute module is designed to be enclosed and cooled by directhydrofluorocarbon jet spray or pool flow.

[1]  C. Schow,et al.  Terabit/s-Class Optical PCB Links Incorporating 360-Gb/s Bidirectional 850 nm Parallel Optical Transceivers , 2012, Journal of Lightwave Technology.

[2]  Hsien-Hsin S. Lee,et al.  An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[3]  Gee-Kung Chang,et al.  Millimeter-wave main memory-to-processor data bus , 2010, 2010 11th International Conference on Electronic Packaging Technology & High Density Packaging.

[4]  T. A. Abele,et al.  A High-Capacity Digital Communication System Using TE/sub 01/ Transmission in Circular Waveguide , 1975 .

[5]  Laxmi N. Bhuyan,et al.  A new server I/O architecture for high speed networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[6]  Michel Dubois,et al.  The Performance Of Cache-coherent Ring-based Multiprocessors , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[7]  Jackson Braz Marcinichen,et al.  Cooling of microprocessors with micro-evaporation: A novel two-phase cooling cycle , 2010 .

[8]  Millimeter-wave integrated waveguides on silicon , 2011, 2011 IEEE 11th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems.

[9]  Yong Meng Teo,et al.  Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems , 2011, 2011 International Conference on Parallel Processing.

[10]  Luiz André Barroso,et al.  The performance of cache-coherent ring-based multiprocessors , 1993, ISCA '93.

[11]  Young-Hyun Jun,et al.  8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology , 2009, IEEE Journal of Solid-State Circuits.

[12]  Ting Wu,et al.  A 16Gb/s/link, 64GB/s bidirectional asymmetric memory interface cell , 2008, 2008 IEEE Symposium on VLSI Circuits.

[13]  Shu-Hao Fan,et al.  Toward a 60‐GHz wireless, low‐power, high‐throughput memory access system , 2009 .

[14]  Liqiang Cao,et al.  Low latency high throughput memory-processor interface , 2012, 2012 IEEE 62nd Electronic Components and Technology Conference.

[15]  Liqiang Cao,et al.  Millimeter wave interchip communication , 2012, Proceedings of 2012 5th Global Symposium on Millimeter-Waves.

[16]  David A. B. Miller,et al.  Limit to the Bit-Rate Capacity of Electrical Interconnects from the Aspect Ratio of the System Architecture , 1997, J. Parallel Distributed Comput..

[17]  De-feng Liu,et al.  Understanding how memory-level parallelism affects the processors performance , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[18]  Su Liu,et al.  A Processor-DMA-Based Memory Copy Hardware Accelerator , 2011, 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage.