Enabling scalable chiplet-based uniform memory architectures with silicon photonics

Chiplet-based systems have recently received much attention for scaling-up processing power in HPC systems due to their high energy efficiency and low cost manufacturing; however, large inter-chiplet NUMA latencies, distance-related energy overheads, and limited IO bandwidth caused by state-of-the-art packaging and interconnect technologies substantially limit their scalability. Large last level caches (up to 16MiB/chiplet and 40% of chiplet area) of current systems can only temporarily hide these limitations and come at the large cost and leakage power of SRAM cells. In this paper, we propose the use of integrated silicon-photonic (SiPh) interconnects on an organic package substrate which combines low material costs with a high IO bandwidth, distance-independent energy consumption, and low-latency point-to-point interconnection fabric to effectively overcome current interconnect and packaging limitations. We exploit the properties of this fabric to propose a scalable uniform memory architecture (S-UMA) that overcomes all NUMA-related performance challenges. Moreover, we propose exploiting our low-latency SiPh fabric to remove the large LLC caches from the processor chiplets and re-integrate them into separate chiplets, increasing manufacturing yield by using smaller chiplets, allowing to use the most efficient process for SRAM circuits, or easing integration of alternative memory technologies without performance hits. Compared to state-of-the-art architectures, S-UMA offers 23% performance speed-up and 30% network power savings on average across HPC workloads for a 8-chiplet 64-core system.

[1]  Natalie D. Enright Jerger,et al.  Enabling interposer-based disintegration of multi-core processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Sebastian Werner,et al.  Amon: An Advanced Mesh-like Optical NoC , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[3]  Michael Gschwind,et al.  IBM POWER8 processor core microarchitecture , 2015, IBM J. Res. Dev..

[4]  Carole-Jean Wu,et al.  MCM-GPU: Multi-chip-module GPUs for continued performance scalability , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[5]  Chen Sun,et al.  DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[6]  Xi Chen,et al.  A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications , 2013, IEEE Journal of Solid-State Circuits.

[7]  Nikolaos Hardavellas,et al.  EcoLaser: An adaptive laser control for energy-efficient on-chip photonic interconnects , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[8]  Roberto Proietti,et al.  Bit-parallel all-to-all and flexible AWGR-based optical interconnects , 2017, 2017 Optical Fiber Communications Conference and Exhibition (OFC).

[9]  Hong Wang,et al.  Density Tradeoffs of Non-Volatile Memory as a Replacement for SRAM Based Last Level Cache , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[10]  Suresh Ramalingam 3D-ICs: Advances in the Industry , 2014 .

[11]  Natalie D. Enright Jerger,et al.  Interconnect-Memory Challenges for Multi-chip, Silicon Interposer Systems , 2015, MEMSYS.

[12]  Roberto Proietti,et al.  Design and Evaluation of AWGR-Based Photonic NoC Architectures for 2.5D Integrated High Performance Computing Systems , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13]  Rajeev J. Ram,et al.  A 40-Gb/s PAM-4 Transmitter Based on a Ring-Resonator Optical DAC in 45-nm SOI CMOS , 2017, IEEE Journal of Solid-State Circuits.

[14]  Nikolaos Hardavellas,et al.  Energy-Proportional Photonic Interconnects , 2016, ACM Trans. Archit. Code Optim..

[15]  Kai Li,et al.  PARSEC3.0: A Multicore Benchmark Suite with Network Stacks and SPLASH-2X , 2017, CARN.

[16]  Natalie D. Enright Jerger,et al.  NoC Architectures for Silicon Interposer Systems: Why Pay for more Wires when you Can Get them (from your interposer) for Free? , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[17]  Aravind Srinivasan,et al.  A monolithically-integrated optical transmitter and receiver in a zero-change 45nm SOI process , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[18]  Natalie D. Enright Jerger,et al.  QuT: A low-power optical Network-on-Chip , 2014, 2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[19]  Alexandre Ayres de Sousa 3D Monolithic Integration : performance, Power and Area Evaluation for 14nm and beyond , 2017 .

[20]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[21]  Stéphane Bellenger,et al.  Silicon Interposers with Integrated Passive Devices: Ultra-Miniaturized Solution using 2.5D Packaging Platform , 2014 .

[22]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[23]  Sebastian Werner,et al.  A Survey on Optical Network-on-Chip Architectures , 2017, ACM Comput. Surv..

[24]  Nikolaos Hardavellas,et al.  SLaC: Stage laser control for a flattened butterfly network , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[25]  Danielle M. R. Wuchenich,et al.  Interferometric imaging using Si3N4 photonic integrated circuits for a SPIDER imager. , 2018, Optics express.

[26]  Roger Dangel,et al.  Polymer Waveguides Enabling Scalable Low-Loss Adiabatic Optical Coupling for Silicon Photonics , 2018, IEEE Journal of Selected Topics in Quantum Electronics.

[27]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[28]  Binhao Wang,et al.  A 25 Gb/s, 4.4 V-Swing, AC-Coupled Ring Modulator-Based WDM Transmitter with Wavelength Stabilization in 65 nm CMOS , 2015, IEEE Journal of Solid-State Circuits.

[29]  Ali Adibi,et al.  Interposer Technologies for High-Performance Applications , 2017, IEEE Transactions on Components, Packaging and Manufacturing Technology.

[30]  Eric Beyne,et al.  The 3-D Interconnect Technology Landscape , 2016, IEEE Design & Test.

[31]  Roberto Proietti,et al.  Towards Energy-Efficient High-Throughput Photonic NoCs for 2.5D Integrated Systems: A Case for AWGRs , 2018, 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[32]  Hafizur Rahaman,et al.  Design of an NoC with on-chip photonic interconnects using adaptive CDMA links , 2012, 2012 IEEE International SOC Conference.

[33]  Yuan Xie,et al.  Cost-effective design of scalable high-performance systems using active and passive interposers , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[34]  Natalie D. Enright Jerger,et al.  Modular Routing Design for Chiplet-Based Systems , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[35]  S. J. B. Yoo,et al.  Ultra-Compact Silicon Photonic 512 × 512 25 GHz Arrayed Waveguide Grating Router , 2014, IEEE Journal of Selected Topics in Quantum Electronics.

[36]  Vivien Quéma,et al.  Large Pages May Be Harmful on NUMA Systems , 2014, USENIX Annual Technical Conference.

[37]  Roberto Proietti,et al.  AWGR-based optical processor-to-memory communication for low-latency, low-energy vault accesses , 2018, MEMSYS.

[38]  Kuanping Shang,et al.  Low-Loss Compact Silicon Nitride Arrayed Waveguide Gratings for Photonic Integrated Circuits , 2017, IEEE Photonics Journal.

[39]  Ashok V. Krishnamoorthy,et al.  Silicon-photonic network architectures for scalable, power-efficient multi-chip systems , 2010, ISCA '10.

[40]  Subramanian S. Iyer Heterogeneous Integration using the Silicon Interconnect Fabric , 2018, 2018 IEEE 2nd Electron Devices Technology and Manufacturing Conference (EDTM).

[41]  Marco Fiorentino,et al.  A ring-resonator-based silicon photonics transceiver with bias-based wavelength stabilization and adaptive-power-sensitivity receiver , 2013, 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[42]  Suresh Ramalingam HBM package integration: Technology trends, challenges and applications , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).

[43]  Sean White,et al.  ‘Zeppelin’: An SoC for multichip architectures , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[44]  Joe Macri,et al.  AMD's next generation GPU and high bandwidth memory architecture: FURY , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[45]  Rajeev J. Ram Photonic-electronic integration with polysilicon photonics in bulk CMOS , 2015, Photonics West - Optoelectronic Materials and Devices.

[46]  B F Rogers,et al.  It's time. , 1989, Nursing.

[47]  Chen Sun,et al.  A 45nm SOI monolithic photonics chip-to-chip link with bit-statistics-based resonant microring thermal tuning , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[48]  Nikolaos Hardavellas,et al.  Galaxy: a high-performance energy-efficient multi-chip architecture using photonic interconnects , 2014, ICS '14.

[49]  Ron Ho,et al.  3.3 A 14nm 1GHz FPGA with 2.5D transceiver integration , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[50]  Isabel De Sousa,et al.  The future of packaging with silicon photonics , 2016 .

[51]  Luca P. Carloni,et al.  Photonic Network-on-Chip Design , 2013, Integrated Circuits and Systems.

[52]  Phillip Stanley-Marbell,et al.  Pinned to the walls — Impact of packaging and application properties on the memory and power walls , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[53]  Tanaka Shinsuke,et al.  A 25Gb/s Hybrid Integrated Silicon Photonic Transceiver in 28nm CMOS and SOI , 2015 .

[54]  Jens H. Schmid,et al.  Roadmap on silicon photonics , 2016 .

[55]  Ajaykumar Kannan,et al.  Exploiting Interposer Technologies to Disintegrate and Reintegrate Multicore Processors , 2016, IEEE Micro.

[56]  Anthony Chan Carusone,et al.  A 0.3 pJ/bit 20 Gb/s/Wire Parallel Interface for Die-to-Die Communication , 2016, IEEE Journal of Solid-State Circuits.

[57]  Islam A. Salama,et al.  Embedded Multi‐die Interconnect Bridge (EMIB) , 2019 .

[58]  Yoichi Koyanagi,et al.  22.2 A 25Gb/s hybrid integrated silicon photonic transceiver in 28nm CMOS and SOI , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[59]  Stephen Phillips,et al.  M7: Next generation SPARC , 2014, IEEE Hot Chips Symposium.

[60]  John E. Bowers,et al.  Low-Loss Silicon Nitride AWG Demultiplexer Heterogeneously Integrated With Hybrid III–V/Silicon Photodetectors , 2014, Journal of Lightwave Technology.

[61]  Roger Dangel,et al.  Polymer waveguides for electro-optical integration in data centers and high-performance computers. , 2015, Optics express.

[62]  Milo M. K. Martin,et al.  Why on-chip cache coherence is here to stay , 2012, Commun. ACM.

[63]  Jian Wang,et al.  On-chip silicon photonic signaling and processing: a review. , 2018, Science bulletin.

[64]  Cheng Li,et al.  A 25 Gb/s Hybrid-Integrated Silicon Photonic Source-Synchronous Receiver With Microring Wavelength Stabilization , 2016, IEEE Journal of Solid-State Circuits.