Extending the Power-Efficiency and Performance of Photonic Interconnects for Heterogeneous Multicores with Machine Learning

As communication energy exceeds computation energy in future technologies, traditional on-chip electrical interconnects face fundamental challenges in the many-core era. Photonic interconnects have been proposed as a disruptive technology solution due to superior performance per Watt, distance independent energy consumption and CMOS compatibility for on-chip interconnects. Static power due to the laser being always switched on, varying link utilization due to spatial and temporal traffic fluctuations and thermal sensitivity are some of the critical challenges facing photonics interconnects. In this paper, we propose photonic interconnects for heterogeneous multicores using a checkerboard pattern that clusters CPU-GPU cores together and implements bandwidth reconfiguration using local router information without global coordination. To reduce the static power, we also propose a dynamic laser scaling technique that predicts the power level for the next epoch using the buffer occupancy of previous epoch. To further improve power-performance trade-offs, we also propose a regression-based machine learning technique for scaling the power of the photonic link. Our simulation results demonstrate a 34% performance improvement over a baseline electrical CMESH while consuming 25% less energy per bit when dynamically reallocating bandwidth. When dynamically scaling laser power, our buffer-based reactive and ML-based proactive prediction techniques show 40 - 65% in power savings with 0 - 14% in throughput loss depending on the reservation window size.

[1]  Venkatesh Akella,et al.  Addressing system-level trimming issues in on-chip nanophotonic networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[2]  Yu Zhang,et al.  Firefly: illuminating future network-on-chip with nanophotonics , 2009, ISCA '09.

[3]  David R. Kaeli,et al.  Leveraging Silicon-Photonic NoC for Designing Scalable GPUs , 2015, ICS.

[4]  Ahmed Louri,et al.  3D-NoC: Reconfigurable 3D photonic on-chip interconnect for multicores , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[5]  Chen Sun,et al.  DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[6]  Ahmed Louri,et al.  Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Nevin Kirman,et al.  A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing , 2010, ASPLOS 2010.

[8]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Luca P. Carloni,et al.  Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors , 2008, IEEE Transactions on Computers.

[10]  José F. Martínez,et al.  A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing , 2010, ASPLOS XV.

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Sudhakar Yalamanchili,et al.  Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture , 2013, J. Parallel Distributed Comput..

[13]  Jaewon Lee,et al.  GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[14]  Xiang Zhang,et al.  A multilayer nanophotonic interconnection network for on-chip many-core communications , 2010, Design Automation Conference.

[15]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[16]  Christopher Batten,et al.  Re-architecting DRAM memory systems with monolithically integrated silicon photonics , 2010, ISCA.

[17]  Feng Liu,et al.  Dynamically managed data for CPU-GPU architectures , 2012, CGO '12.

[18]  Tao Li,et al.  Integrating nanophotonics in GPU microarchitecture , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Mikko H. Lipasti,et al.  Wavelength stealing: An opportunistic approach to channel sharing in multi-chip photonic interconnects , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Li-Shiuan Peh,et al.  Enabling system-level modeling of variation-induced faults in Networks-on-Chips , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Haoran Li,et al.  MOCA: An inter/intra-chip optical network for memory , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[22]  Li Zhou,et al.  PROBE: Prediction-based optical bandwidth scaling for energy-efficient NoCs , 2013, 2013 Seventh IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[23]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[24]  Bin Liu,et al.  KiloCore: A 32-nm 1000-Processor Computational Array , 2017, IEEE Journal of Solid-State Circuits.

[25]  Christopher Batten,et al.  Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics , 2008, 2008 16th IEEE Symposium on High Performance Interconnects.

[26]  David A. B. Miller,et al.  Device Requirements for Optical Interconnects to Silicon Chips , 2009, Proceedings of the IEEE.

[27]  M. Lipson,et al.  Wide temperature range operation of micrometer-scale silicon electro-optic modulators. , 2008, Optics letters.

[28]  Chen Sun,et al.  Addressing link-level design tradeoffs for integrated photonic interconnects , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[29]  John Kim,et al.  FeatherWeight: Low-cost optical arbitration with QoS support , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30]  Christopher Batten,et al.  Silicon-photonic clos networks for global on-chip communication , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[31]  George Kurian,et al.  ATAC: A 1000-core cache-coherent processor with on-chip optical network , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  Rajeev J. Ram,et al.  Single-chip microprocessor that communicates directly using light , 2015, Nature.

[33]  Sudhakar Yalamanchili,et al.  Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures , 2013, ACM Trans. Design Autom. Electr. Syst..

[34]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[35]  Jung Ho Ahn,et al.  Devices and architectures for photonic chip-scale integration , 2009 .

[36]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[37]  S. J. B. Yoo,et al.  Towards athermal optically-interconnected computing system using slotted silicon microring resonators and RF-photonic comb generation , 2009 .

[38]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[39]  Zhongliang Chen,et al.  Exploring the heterogeneous design space for both performance and reliability , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[40]  Seok-Hwan Jeong,et al.  Highly-efficient, low-noise Si hybrid laser using flip-chip bonded SOA , 2012, 2012 Optical Interconnects Conference.

[41]  Ketan Patel,et al.  High-power single-mode InGaAsP/InP laser diodes for pulsed operation , 2012, OPTO.

[42]  Radu Marculescu,et al.  SVR-NoC: A performance analysis tool for Network-on-Chips using learning-based support vector regression model , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[43]  Nikolaos Hardavellas,et al.  SLaC: Stage laser control for a flattened butterfly network , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[44]  John E. Bowers,et al.  Energy Efficient and Energy Proportional Optical Interconnects for Multi-Core Processors: Driving the Need for On-Chip Sources , 2014, IEEE Journal of Selected Topics in Quantum Electronics.

[45]  Shekhar Y. Borkar Exascale Computing - A Fact or a Fiction? , 2013, IPDPS.

[46]  Mahmut T. Kandemir,et al.  Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[47]  Andrew B. Kahng,et al.  Adaptive Tuning of Photonic Devices in a Photonic NoC Through Dynamic Workload Allocation , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[48]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[49]  Xuhao Chen,et al.  Adaptive Cache Management for Energy-Efficient GPU Computing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[50]  Radu Marculescu,et al.  Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[51]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[52]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[53]  Amlan Ganguly,et al.  Heterogeneous photonic Network-on-Chip with dynamic bandwidth allocation , 2014, 2014 27th IEEE International System-on-Chip Conference (SOCC).

[54]  Lin Yang,et al.  Four-Port Optical Switch for Fat-Tree Photonic Network-on-Chip , 2017, Journal of Lightwave Technology.