论文信息 - Low latency compute node architecture cooled by a two phase fluid flow

Low latency compute node architecture cooled by a two phase fluid flow

As high performance multi-core scalar CPU and vector GPU processors approach 256 GFLOPSof processing power, transport latency and bandwidth (BW) between on-board DRAM and processor become a substantial bottleneck to optimal system performance. This is, in large part, because board level, data transport occurs over legacy L-C transmission lines having limited BW over a limited distance. As a consequence, high performance, systems running memory intensive applications are able to utilize only a fraction of their available computational potential and remain idle for many clock cycles while waiting for data and instructions. A number of alternate short range transport technologies are listed in the International Technology Roadmap for Semiconductors, among which the most promising is inter-chip optical communication. This paper proposes scalable, guided millimeter wave inter-chip communication with high speed I/Os on a common co-planar wiring net to reduce latency. Advantage is taken of high order digital M-QAM modulation to scale the spectral efficiency of carrier waves coding. Design advantage is offered by 3D DRAM stacking to achieve DRAM volume and 3D interposer stacking to achieve high I/O count and wiring escape BW. The compute module is designed to be enclosed and cooled by directhydrofluorocarbon jet spray or pool flow.

[1] C. Schow,et al. Terabit/s-Class Optical PCB Links Incorporating 360-Gb/s Bidirectional 850 nm Parallel Optical Transceivers , 2012, Journal of Lightwave Technology.

[2] Hsien-Hsin S. Lee,et al. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[3] Gee-Kung Chang,et al. Millimeter-wave main memory-to-processor data bus , 2010, 2010 11th International Conference on Electronic Packaging Technology & High Density Packaging.

[4] T. A. Abele,et al. A High-Capacity Digital Communication System Using TE/sub 01/ Transmission in Circular Waveguide , 1975 .

[5] Laxmi N. Bhuyan,et al. A new server I/O architecture for high speed networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[6] Michel Dubois,et al. The Performance Of Cache-coherent Ring-based Multiprocessors , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[7] Jackson Braz Marcinichen,et al. Cooling of microprocessors with micro-evaporation: A novel two-phase cooling cycle , 2010 .

[8] Millimeter-wave integrated waveguides on silicon , 2011, 2011 IEEE 11th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems.

[9] Yong Meng Teo,et al. Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems , 2011, 2011 International Conference on Parallel Processing.

[10] Luiz André Barroso,et al. The performance of cache-coherent ring-based multiprocessors , 1993, ISCA '93.

[11] Young-Hyun Jun,et al. 8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology , 2009, IEEE Journal of Solid-State Circuits.

[12] Ting Wu,et al. A 16Gb/s/link, 64GB/s bidirectional asymmetric memory interface cell , 2008, 2008 IEEE Symposium on VLSI Circuits.

[13] Shu-Hao Fan,et al. Toward a 60‐GHz wireless, low‐power, high‐throughput memory access system , 2009 .

[14] Liqiang Cao,et al. Low latency high throughput memory-processor interface , 2012, 2012 IEEE 62nd Electronic Components and Technology Conference.

[15] Liqiang Cao,et al. Millimeter wave interchip communication , 2012, Proceedings of 2012 5th Global Symposium on Millimeter-Waves.

[16] David A. B. Miller,et al. Limit to the Bit-Rate Capacity of Electrical Interconnects from the Aspect Ratio of the System Architecture , 1997, J. Parallel Distributed Comput..

[17] De-feng Liu,et al. Understanding how memory-level parallelism affects the processors performance , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[18] Su Liu,et al. A Processor-DMA-Based Memory Copy Hardware Accelerator , 2011, 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage.