NUPLet: A Photonic Based Multi-Chip NUCA Architecture

Area, manufacturing yield and lack of scalable interconnects restrict single chip designs to a small number of cores (16-32). However, multi-chip designs with the help of silicon photonics can overcome area and yield constraints and make it possible to design a virtual chip, which can scale to a large number of cores. Sadly, the scalability of such designs is limited bythehighpercentageofinter-chipmessagesandrelativelylower hit rate in remote cache banks. In this paper, we propose NUPLet, a multi-chip architecture that tries to remove these limitations by separating the intra and inter chip networks. It proposes to use a non-uniform cache architecture (NUCA) scheme on top of a virtual chip in order to decreaseinterchipcommunicationandincreasethehitrateinthe last level cache. In addition, we propose a prediction mechanism for predicting the number of inter chip messages in the network. This is used to modulate the laser accordingly, and reduce static power consumption. We simulated a four chip based NUPLet design with each chip containing 32 cores. For a suite of Splash2 and Parsec benchmarks, NUPLet increased the last level cache hit rate by 70% as compared to other state of the art proposals. Furthermore, NUPLet improved performance by 28%, reduced power consumption by 39%, and reduced ED^2 by 41%.

[1]  Smruti R. Sarangi,et al.  ColdBus: A Near-Optimal Power Efficient Optical Bus , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[2]  Nikolaos Hardavellas,et al.  Galaxy: a high-performance energy-efficient multi-chip architecture using photonic interconnects , 2014, ICS '14.

[3]  Graham T. Reed,et al.  Silicon Photonics: The State of the Art , 2008 .

[4]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Avinash Karanth Kodi,et al.  Extending the Performance and Energy-Efficiency of Shared Memory Multicores with Nanophotonic Technology , 2014, IEEE Transactions on Parallel and Distributed Systems.

[6]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[7]  A. Enard,et al.  High-Power, Low RIN 1.55-$\mu{\rm m}$ Directly Modulated DFB Lasers for Analog Signal Transmission , 2012, IEEE Photonics Technology Letters.

[8]  Qianfan Xu,et al.  Micrometre-scale silicon electro-optic modulator , 2005, Nature.

[9]  Smruti R. Sarangi,et al.  Optical overlay NUCA: A high speed substrate for shared L2 caches , 2014, HiPC.

[10]  High-performance silicon-based multiple wavelength source , 2011, CLEO: 2011 - Laser Science to Photonic Applications.

[11]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[12]  Yu Zhang,et al.  Firefly: illuminating future network-on-chip with nanophotonics , 2009, ISCA '09.

[13]  Smruti R. Sarangi,et al.  Active microring based tunable optical power splitters , 2016 .

[14]  Ultra-high power, low RIN and narrow linewidth lasers for 1550nm DWDM 100km long-haul fiber optic link , 2008, LEOS 2008 - 21st Annual Meeting of the IEEE Lasers and Electro-Optics Society.

[15]  George Kurian,et al.  ATAC: A 1000-core cache-coherent processor with on-chip optical network , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Ian O'Connor,et al.  Optical solutions for system-level interconnect , 2004, SLIP '04.

[17]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[19]  John Kim,et al.  FlexiShare: Channel sharing for an energy-efficient nanophotonic crossbar , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[20]  Avinash Karanth Kodi,et al.  Exploring the Design of 64- and 256-Core Power Efficient Nanophotonic Interconnect , 2010, IEEE Journal of Selected Topics in Quantum Electronics.

[21]  Prathmesh Kallurkar,et al.  Tejas: A java based versatile micro-architectural simulator , 2015, 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[22]  Changkyu Kim,et al.  Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches , 2003, IEEE Micro.

[23]  Li Zhou,et al.  PROBE: Prediction-based optical bandwidth scaling for energy-efficient NoCs , 2013, 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[24]  Smruti R. Sarangi,et al.  Optimal Power Efficient Photonic SWMR Buses , 2015, 2015 Workshop on Exploiting Silicon Photonics for Energy-Efficient High Performance Computing.