Exploring Silicon Nanophotonics in Throughput Architecture

In high-performance throughput processors, the interconnect experiences power and energy bottlenecks due to massive parallelism and ever-increasing memory accesses in emerging workloads. Also, high-performance throughput processors have exposed bandwidth and latency bottlenecks in on-chip interconnect and off-chip memory access. To eliminate such bottlenecks, we propose silicon nanophotonics and 3-D stacking technologies in throughput architecture. It provides higher communication bandwidth and lower latency signaling mechanisms at reduced power. We evaluate a 3-D stacked GPU with 2048 SIMD cores having photonic interconnect. The photonic multiple-writer-single-reader crossbar network with 32-B channel bandwidth on average achieves 91% network power reduction. In addition, it improves average core-to-memory access latency by 59% and 87% for northbound and southbound traffic, respectively. We anticipate that for emerging workloads and microarchitectures the implications of the proposed ideas are far reaching.

[1]  Masaya Notomi,et al.  An on-chip coupled resonator optical waveguide single-photon buffer , 2013, Nature Communications.

[2]  Ahmed Louri,et al.  Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[3]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[4]  A. Melloni,et al.  Experimental evaluation of ring resonator filters impact on the bit error rate in non return to zero transmission systems , 2004 .

[5]  Lieven Eeckhout,et al.  Designing Computer Architecture Research Workloads , 2003, Computer.

[6]  Christopher Batten,et al.  Silicon-photonic clos networks for global on-chip communication , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[7]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[8]  Chen Sun,et al.  Re-architecting DRAM with Monolithically Integrated Silicon Photonics , 2009 .

[9]  Yuan Xie,et al.  3D GPU architecture using cache stacking: Performance, cost, power and thermal analysis , 2009, 2009 IEEE International Conference on Computer Design.

[10]  John Kim,et al.  FlexiShare: Channel sharing for an energy-efficient nanophotonic crossbar , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[11]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[12]  Qianfan Xu,et al.  12.5 Gbit/s carrier-injection-based silicon micro-ring silicon modulators. , 2007, Optics express.

[13]  David H. Albonesi,et al.  Phastlane: a rapid transit optical routing network , 2009, ISCA '09.

[14]  R. Govindarajan,et al.  Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.

[15]  M. Lipson,et al.  All-optical control of light on a silicon chip , 2004, Nature.

[16]  Tao Li,et al.  Exploring high-performance and energy proportional interface for phase change memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[17]  Christopher Batten,et al.  Re-architecting DRAM memory systems with monolithically integrated silicon photonics , 2010, ISCA.

[18]  Lieven Eeckhout,et al.  Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.

[19]  David J. Thomson,et al.  Silicon optical modulators , 2010 .

[20]  Yu Zhang,et al.  Firefly: illuminating future network-on-chip with nanophotonics , 2009, ISCA '09.