An Electro-Photonic System for Accelerating Deep Neural Networks

The number of parameters in deep neural networks (DNNs) is scaling at about 5× the rate of Moore’s Law. To sustain the pace of growth of the DNNs, new technologies and computing architectures are needed. Photonic computing systems are promising avenues, since they can perform the dominant general matrix-matrix multiplication (GEMM) operations in DNNs at a higher throughput than their electrical counterpart. However, purely photonic systems face several challenges including a lack of photonic memory, the need for conversion circuits, and the accumulation of noise. In this paper, we propose a hybrid electrophotonic system realizing the best of both worlds to accelerate DNNs. In contrast to prior work in photonic and electronic accelerators, we adopt a system-level perspective. Our electro-photonic system includes an electronic host processor and DRAM, and a custom electro-photonic hardware accelerator called ADEPT. The fused hardware accelerator leverages a photonic computing unit for performing highly-efficient GEMM operations and a digital electronic ASIC for storage and for performing nonGEMM operations. We also identify architectural optimization opportunities for improving the overall ADEPT’s efficiency. We evaluate ADEPT using three state-of-the-art neural networks— ResNet-50, BERT-large, and RNN-T—to show its general applicability in accelerating today’s DNNs. A head-to-head comparison of ADEPT with systolic array architectures shows that ADEPT can provide, on average, 7.19× higher inference throughput per watt.

[1]  Humphreys,et al.  An Optimal Design for Universal Multiport Interferometers , 2016, 1603.08788.

[2]  H J Caulfield Parallel N(4) weighted optical interconnections. , 1987, Applied optics.

[3]  Ahmed Louri,et al.  PIXEL: Photonic Neural Network Accelerator , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4]  P. Dumon,et al.  Silicon microring resonators , 2012 .

[5]  J L O'Brien,et al.  60  dB high-extinction auto-configured Mach-Zehnder interferometer. , 2016, Optics letters.

[6]  Yiran Chen,et al.  A Survey of Accelerator Architectures for Deep Neural Networks , 2020 .

[7]  Tai-Haur Kuo,et al.  A 10-GS/s NRZ/Mixing DAC With Switching-Glitch Compensation Achieving SFDR >64/50 dBc Over the First/Second Nyquist Zone , 2021, IEEE Journal of Solid-State Circuits.

[8]  Darren J. Kerbyson,et al.  Analysis of double buffering on two different multicore architectures: Quad-core Opteron and the Cell-BE , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[9]  Rui Paulo Martins,et al.  A 5 GS/s 29 mW Interleaved SAR ADC With 48.5 dB SNDR Using Digital-Mixing Background Timing-Skew Calibration for Direct Sampling Applications , 2020, IEEE Access.

[10]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[11]  Srihari Cadambi,et al.  A Massively Parallel Coprocessor for Convolutional Neural Networks , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[12]  Jing Wang,et al.  A fast deep learning system using GPU , 2014, 2014 IEEE International Symposium on Circuits and Systems (ISCAS).

[13]  Weichen Liu,et al.  HolyLight: A Nanophotonic Accelerator for Deep Learning in Data Centers , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Dirk Englund,et al.  Freely scalable and reconfigurable optical hardware for deep learning , 2020, Scientific reports.

[15]  Mahdi Nikdast,et al.  CrossLight: A Cross-Layer Optimized Silicon Photonic Neural Network Accelerator , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[16]  Nicola Calabretta,et al.  Deep Neural Network Through an InP SOA-Based Photonic Integrated Cross-Connect , 2020, IEEE Journal of Selected Topics in Quantum Electronics.

[17]  Hideaki Okayama,et al.  12.5-Gb/s operation with 0.29-V·cm V(π)L using silicon Mach-Zehnder modulator based-on forward-biased pin diode. , 2012, Optics express.

[18]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[19]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[20]  Jie Sun,et al.  Adiabatic thermo-optic Mach-Zehnder switch. , 2013, Optics letters.

[21]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[22]  Jens Limpert,et al.  The future is fibre accelerators , 2013, Nature Photonics.

[23]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[24]  Cody Coleman,et al.  MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[25]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Ahmed Louri,et al.  Albireo: Energy-Efficient Acceleration of Convolutional Neural Networks via Silicon Photonics , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[28]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Tarek El-Ghazawi,et al.  DNNARA: A Deep Neural Network Accelerator using Residue Arithmetic and Integrated Photonics , 2020, ICPP.

[30]  Dirk Englund,et al.  Deep learning with coherent nanophotonic circuits , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[31]  A. Boes,et al.  11 TOPS photonic convolutional accelerator for optical neural networks , 2021, Nature.

[32]  Dirk Englund,et al.  Hardware error correction for programmable photonics , 2021, ArXiv.

[33]  Michiel Steyaert,et al.  Solving Static and Dynamic Performance Limitations for High Speed D/A Converters , 2003 .

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Eric P. Xing,et al.  GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[37]  Paul R. Prucnal,et al.  Silicon Photonic Modulator Neuron , 2018, Physical Review Applied.

[38]  H. Tang,et al.  Broadband nanoelectromechanical phase shifting of light on a chip , 2013, 1312.2454.

[39]  Xuan Li,et al.  Parallel convolutional processing using an integrated photonic tensor core , 2021, Nature.

[40]  Patrick Judd,et al.  Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation , 2020, ArXiv.

[41]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[42]  Paul R. Prucnal,et al.  Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs) , 2019, IEEE Journal of Selected Topics in Quantum Electronics.

[43]  N. Harris,et al.  Efficient, compact and low loss thermo-optic phase shifter in silicon. , 2014, Optics express.

[44]  Zhenhua Ni,et al.  Monolayer graphene as a saturable absorber in a mode-locked laser , 2010, 1007.2243.

[45]  T. E. Bell,et al.  Optical computing: A field in flux , 1986, IEEE Spectrum.

[46]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[47]  Christopher C. Tison,et al.  Linear programmable nanophotonic processors , 2018, Optica.

[48]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[49]  Walter F. Kosonocky,et al.  Progress in optical computer research , 1965, IEEE Spectrum.

[50]  N. Calabretta,et al.  WDM Weighted Sum in an 8x8 SOA-Based InP Cross-Connect for Photonic Deep Neural Networks , 2018, 2018 Photonics in Switching and Computing (PSC).

[51]  Reck,et al.  Experimental realization of any discrete unitary operator. , 1994, Physical review letters.

[52]  Mario Miscuglio,et al.  All-optical nonlinear activation function for photonic neural networks [Invited] , 2018, Optical Materials Express.

[53]  Armin Mehrabian,et al.  PCNNA: A Photonic Convolutional Neural Network Accelerator , 2018, 2018 31st IEEE International System-on-Chip Conference (SOCC).

[54]  Demetri Psaltis,et al.  Competitive photonic neural networks , 2021, Nature Photonics.

[55]  Javier Ayala,et al.  300-mm Monolithic Silicon Photonics Foundry Technology , 2019, IEEE Journal of Selected Topics in Quantum Electronics.

[56]  Ronny Krashinsky,et al.  NVIDIA A100 Tensor Core GPU: Performance and Innovation , 2021, IEEE Micro.

[57]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[58]  Ryan Hamerly,et al.  Accurate Self-Configuration of Rectangular Multiport Interferometers , 2021, ArXiv.

[59]  Xin Tu,et al.  State of the Art and Perspectives on Silicon Photonic Switches , 2019, Micromachines.

[60]  Rajeev J. Ram,et al.  Single-chip microprocessor that communicates directly using light , 2015, Nature.

[61]  Christopher V. Poulton,et al.  Electric field-induced second-order nonlinear optical effects in silicon waveguides , 2017 .

[62]  Yaocheng Shi,et al.  Flat-Top CWDM (De)Multiplexer Based on MZI With Bent Directional Couplers , 2018, IEEE Photonics Technology Letters.

[63]  Demetri Psaltis,et al.  Experimental Demonstrations of Optical Neural Computers , 1987, NIPS.

[64]  Philippe Regreny,et al.  III-V/Si photonics by die-to-wafer bonding , 2007 .

[66]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[67]  Tara N. Sainath,et al.  Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[69]  Changming Wu,et al.  Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network , 2021, Nature communications.

[70]  Luca Benini,et al.  Origami: A 803-GOp/s/W Convolutional Network Accelerator , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[71]  Lei Zhou,et al.  A survey of high-speed high-resolution current steering DACs , 2020, Journal of Semiconductors.

[72]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[73]  D. Knoll,et al.  High bandwidth, high responsivity waveguide-coupled germanium p-i-n photodiode. , 2015, Optics express.

[74]  C. Wright,et al.  Photonics for artificial intelligence and neuromorphic computing , 2020, ArXiv.

[75]  David Gregg,et al.  Low-memory GEMM-based convolution algorithms for deep neural networks , 2017, ArXiv.

[76]  Ryan Hamerly,et al.  Stability of Self-Configuring Large Multiport Interferometers , 2021, ArXiv.

[77]  Vivienne Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2017, IEEE Journal of Solid-State Circuits.

[78]  Ming C. Wu,et al.  Large-scale silicon photonic switches with movable directional couplers , 2015 .

[79]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[80]  Yue Jiang,et al.  All-optical neural network with nonlinear activation functions , 2019, Optica.

[81]  A. Ribeiro,et al.  Demonstration of a 4 × 4-port self-configuring universal linear optical component , 2016, 2016 Progress in Electromagnetic Research Symposium (PIERS).

[82]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[83]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[84]  Gordon Wetzstein,et al.  Inference in artificial intelligence with deep optics and photonics , 2020, Nature.

[85]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[86]  Nam Sung Kim,et al.  GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[87]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.