3D photonics as enabling technology for deep 3D DRAM stacking

3D stacking improves bandwidth, energy, and latency of DRAMs by exploiting shorter and more abundant wiring in three dimensions. While future stacks are predicted to provide tens of DRAM layers, TSV pitches are bound to stop decreasing due to physical limitations, requiring either large area overheads or energy for higher pin data rates to preserve bandwidth scaling. In addition, deep 3D DRAM stacks increase the average number of TSVs hops to reach the appropriate DRAM layer. Recent advances in vertical silicon-photonic interconnects now allow diameters of 1-2μm which results in high bandwidth density for optics. In this paper, we explore the benefits and architectural implications of vertical optical interconnects in terms of area, power, and performance. We propose a hierarchical approach to stacking 3D DRAM to tens of layers by utilizing sub-stacks which are optically-interconnected to a memory interface on the processor die. Our results show that photonics could be a key enabler for deep-3D DRAM offering at least 2× interconnect area savings compared to TSVs for the same bandwidth with comparable performance and less power.

[1]  O Seongil,et al.  Row-buffer decoupling: A case for low-latency DRAM microarchitecture , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[2]  Adrian M. Ionescu,et al.  Ultra fine-pitch TSV technology for ultra-dense high-Q RF inductors , 2015, 2015 Symposium on VLSI Technology (VLSI Technology).

[3]  William J. Dally,et al.  Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Chung-Chih Wang,et al.  Backside-TSV process development and integration for 2∼3um small size TSV , 2016, 2016 11th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT).

[5]  Jung Ho Ahn,et al.  Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs , 2009, IEEE Computer Architecture Letters.

[6]  N. Calabretta,et al.  Scaling optical interconnects to meet the bandwidth density crunch , 2018, OPTO.

[7]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[8]  Paragkumar Thadesar,et al.  Electrical, optical and fluidic through-silicon vias for silicon interposer applications , 2011, 2011 IEEE 61st Electronic Components and Technology Conference (ECTC).

[9]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[10]  F. Inoue,et al.  Novel seed layer formation using direct electroless copper deposition on ALD-Ru layer for high aspect ratio TSV , 2012, 2012 IEEE International Interconnect Technology Conference.

[11]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Dirk Herrmann,et al.  Three Dimensional Integrated Circuit Design , 2016 .

[13]  Sebastian Werner,et al.  A Survey on Optical Network-on-Chip Architectures , 2017, ACM Comput. Surv..

[14]  Ioannis A. Papistas,et al.  Design methodologies for heterogeneous 3-D integrated systems , 2018 .

[15]  M. S. Bakir,et al.  High aspect ratio TSVs in micropin-fin heat sinks for 3D ICs , 2012, 2012 12th IEEE International Conference on Nanotechnology (IEEE-NANO).

[16]  Roberto Proietti,et al.  AWGR-based optical processor-to-memory communication for low-latency, low-energy vault accesses , 2018, MEMSYS.

[17]  Tony Tae-Hyoung Kim,et al.  A 3-Gb/s/ch Simultaneous Bidirectional Capacitive Coupling Transceiver for 3DICs , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[18]  G. Stemme,et al.  Very high aspect ratio through silicon vias (TSVs) using wire bonding , 2013, 2013 Transducers & Eurosensors XXVII: The 17th International Conference on Solid-State Sensors, Actuators and Microsystems (TRANSDUCERS & EUROSENSORS XXVII).

[19]  Alyssa B. Apsel,et al.  Leveraging Optical Technology in Future Bus-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[20]  Tao Zhang,et al.  3D-SWIFT: a high-performance 3D-stacked wide IO DRAM , 2014, GLSVLSI '14.

[21]  Christoforos E. Kozyrakis,et al.  Improving System Energy Efficiency with Memory Rank Subsetting , 2012, TACO.

[22]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[23]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[24]  S. G. Kim,et al.  Integration of silicon photonics into DRAM process , 2013, 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC).

[25]  Tony Tae-Hyoung Kim,et al.  2.31-Gb/s/ch Area-Efficient Crosstalk Canceled Hybrid Capacitive Coupling Interconnect for 3-D Integration , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Changsik Yoo,et al.  A simultaneously bidirectional inductively coupled link in a 0.13-µm CMOS technology , 2017, Int. J. Circuit Theory Appl..

[27]  Qin Fei,et al.  Effects of via pitch on silicon stress in TSV interposer , 2012 .

[28]  Onur Mutlu,et al.  Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[29]  Kyung Whan Kim,et al.  18.3 A 1.2V 64Gb 8-channel 256GB/s HBM DRAM with peripheral-base-die architecture and small-swing technique on heavy load interface , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[30]  G.Q. Zhang,et al.  The paradigm of "more than Moore" , 2005, 2005 6th International Conference on Electronic Packaging Technology.

[31]  Krishnendu Chakrabarty,et al.  Design automation and testing of monolithic 3D ICs: Opportunities, challenges, and solutions: (Invited paper) , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[32]  Ling Xie,et al.  6um Pitch High Density Cu-Cu Bonding for 3D IC Stacking , 2016, 2016 IEEE 66th Electronic Components and Technology Conference (ECTC).

[33]  Yasuhiko Arakawa,et al.  Silicon photonics for next generation system integration platform , 2013, IEEE Communications Magazine.

[34]  Ashok V. Krishnamoorthy,et al.  Computer Systems Based on Silicon Photonic Interconnects A proposed supercomputer-on-a-chip with optical interconnections between processing elements will require development of new lower-energy optical components and new circuit architectures that match electrical datapaths to complementary optical , 2009 .

[35]  Eric Beyne,et al.  Packaging Material Evaluation for 2.5D/3D TSV Application , 2016 .

[36]  Eric Beyne,et al.  Reliability Challenges Related to TSV Integration and 3-D Stacking , 2016, IEEE Design & Test.

[37]  Tadahiro Kuroda,et al.  A 1Tb/s 3W Inductive-Coupling Transceiver Chip , 2007, 2007 Asia and South Pacific Design Automation Conference.

[38]  Takayuki Ohba Wafer level three-dimensional integration (3DI) using bumpless TSV interconnects for tera-scale generation , 2013, 2013 International Semiconductor Conference Dresden - Grenoble (ISCDG).

[39]  Yu Zhang,et al.  High-Density Wafer-Scale 3-D Silicon-Photonic Integrated Circuits , 2018, IEEE Journal of Selected Topics in Quantum Electronics.

[40]  Chen Sun,et al.  DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[41]  Teng Wang,et al.  Die to wafer 3D stacking for below 10um pitch microbumps , 2016, 2016 IEEE International 3D Systems Integration Conference (3DIC).

[42]  An Tong,et al.  Effects of via pitch on silicon stress in TSV interposer , 2012, 2012 13th International Conference on Electronic Packaging Technology & High Density Packaging.

[43]  Mohan Nagar,et al.  3D SiP with Organic Interposer for ASIC and Memory Integration , 2016, 2016 IEEE 66th Electronic Components and Technology Conference (ECTC).

[44]  Lei Guo,et al.  Designs of 3D mesh and torus optical Network-on-Chips: Topology, optical router and routing module , 2017, China Communications.

[45]  Boris Grot,et al.  Farewell My Shared LLC! A Case for Private Die-Stacked DRAM Caches for Servers , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[46]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[47]  Gabriel H. Loh,et al.  Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[48]  Csaba Andras Moritz,et al.  On the Design of Ultra-High Density 14nm Finfet Based Transistor-Level Monolithic 3D ICs , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[49]  Kai Li,et al.  PARSEC3.0: A Multicore Benchmark Suite with Network Stacks and SPLASH-2X , 2017, CARN.

[50]  Onur Mutlu,et al.  Exploiting the DRAM Microarchitecture to Increase Memory-Level Parallelism , 2018, ArXiv.

[51]  Prosenjit Sen,et al.  Die level 3D heterogeneous integration of a microfluidic system , 2017, 2017 IEEE 19th Electronics Packaging Technology Conference (EPTC).

[52]  Haoran Li,et al.  MOCA: An inter/intra-chip optical network for memory , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[53]  Farhad Mehdipour,et al.  Keep-Out-Zone analysis for three-dimensional ICs , 2014, Technical Papers of 2014 International Symposium on VLSI Design, Automation and Test.

[54]  Alexandre Ayres de Sousa 3D Monolithic Integration : performance, Power and Area Evaluation for 14nm and beyond , 2017 .

[55]  Jinkyu Jeong,et al.  A fully associative, tagless DRAM cache , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[56]  Zhen Fang,et al.  Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[57]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[58]  T. Sakurai,et al.  A 195-gb/s 1.2-W inductive inter-chip wireless superconnect with transmit power control scheme for 3-D-stacked system in a package , 2006, IEEE Journal of Solid-State Circuits.

[59]  G. Beyer,et al.  Copper through silicon via induced keep out zone for 10nm node bulk FinFET CMOS technology , 2013, 2013 IEEE International Electron Devices Meeting.

[60]  S. Burkett,et al.  Process integration for through-silicon vias , 2005 .

[61]  Christopher Batten,et al.  Re-architecting DRAM memory systems with monolithically integrated silicon photonics , 2010, ISCA.

[62]  Cheng Li,et al.  A 25 Gb/s Hybrid-Integrated Silicon Photonic Source-Synchronous Receiver With Microring Wavelength Stabilization , 2016, IEEE Journal of Solid-State Circuits.

[63]  J. Kraft,et al.  Through Silicon Via Reliability , 2012, IEEE Transactions on Device and Materials Reliability.

[64]  Curtis Zwenger,et al.  Silicon Wafer Integrated Fan-out Technology , 2015 .

[65]  Dong Li,et al.  Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[66]  Babak Falsafi,et al.  Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.

[67]  Natalie D. Enright Jerger,et al.  QuT: A low-power optical Network-on-Chip , 2014, 2014 Eighth IEEE/ACM International Symposium on Networks-on-Chip (NoCS).

[68]  Sung Kyu Lim,et al.  Monolithic 3D IC vs. TSV-based 3D IC in 14nm FinFET technology , 2016, 2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S).

[69]  Fei Yu,et al.  Continuous-Wave Mid-Infrared Gas Fiber Lasers , 2018, IEEE Journal of Selected Topics in Quantum Electronics.

[70]  PHENIC: silicon photonic 3D-network-on-chip architecture for high-performance Heterogeneous many-core system-on-chip , 2013, 14th International Conference on Sciences and Techniques of Automatic Control & Computer Engineering - STA'2013.

[71]  Hiroshi Nakamura,et al.  A Scalable 3D Heterogeneous Multicore with an Inductive ThruChip Interface , 2013, IEEE Micro.

[72]  Luca P. Carloni,et al.  Engineering a Bandwidth-Scalable Optical Layer for a 3D Multi-core Processor with Awareness of Layout Constraints , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[73]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[74]  Christopher Batten,et al.  Designing multi-socket systems using silicon photonics , 2009, ICS.

[75]  Herbert S. Bennett,et al.  INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS 2015 EDITION OUTSIDE SYSTEM CONNECTIVITY , 2015 .

[76]  Binhao Wang,et al.  A 25 Gb/s, 4.4 V-Swing, AC-Coupled Ring Modulator-Based WDM Transmitter with Wavelength Stabilization in 65 nm CMOS , 2015, IEEE Journal of Solid-State Circuits.

[77]  Sudeep Pasricha,et al.  3D-Wiz: A novel high bandwidth, optically interfaced 3D DRAM architecture with reduced random access time , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[78]  Tadahiro Kuroda,et al.  A 1 TB/s 1 pJ/b 6.4 ${\rm mm}^{2}/{\rm TB/s}$ QDR Inductive-Coupling Interface Between 65-nm CMOS Logic and Emulated 100-nm DRAM , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[79]  Abderazek Ben Abdallah,et al.  PHENIC: silicon photonic 3D-network-on-chip architecture for high-performance Heterogeneous many-core system-on-chip , 2013 .

[80]  Yuan Xie,et al.  Die-stacking Architecture , 2015, Die-stacking Architecture.

[81]  O Seongil,et al.  Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.

[82]  Emre Salman,et al.  Mono3D: Open Source Cell Library for Monolithic 3-D Integrated Circuits , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[83]  Sunil Wickramanayaka,et al.  Study on warpage and stress of TSV wafer with ultra-fine pitch vias for high density chip stacking , 2017, 2017 IEEE 19th Electronics Packaging Technology Conference (EPTC).

[84]  Onur Mutlu,et al.  Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[85]  Jiang Xu,et al.  Thermal-sensitive design and power optimization for a 3D torus-based optical NoC , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[86]  M. Bakir,et al.  3D stacked microfluidic cooling for high-performance 3D ICs , 2012, 2012 IEEE 62nd Electronic Components and Technology Conference.

[87]  Vasilis F. Pavlidis,et al.  Bandwidth-to-area comparison of through silicon vias and inductive links for 3-D ICs , 2015, 2015 European Conference on Circuit Theory and Design (ECCTD).

[88]  Luca P. Carloni,et al.  Networks-on-chip in emerging interconnect paradigms: Advantages and challenges , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[89]  A. Jourdain,et al.  3D Stacking Using Bump-Less Process for Sub 10um Pitch Interconnects , 2016, 2016 IEEE 66th Electronic Components and Technology Conference (ECTC).

[90]  Rachata Ausavarungnirun,et al.  Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms , 2017, SIGMETRICS.

[91]  John H. Lau,et al.  TSV manufacturing yield and hidden costs for 3D IC integration , 2010, 2010 Proceedings 60th Electronic Components and Technology Conference (ECTC).

[92]  Babak Falsafi,et al.  Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[93]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[94]  O. Faynot,et al.  First demonstration of a CMOS over CMOS 3D VLSI CoolCube™ integration on 300mm wafers , 2016, 2016 IEEE Symposium on VLSI Technology.

[95]  Saptadeep Pal,et al.  Heterogeneous Integration at Fine Pitch (≤ 10 µm) Using Thermal Compression Bonding , 2017, 2017 IEEE 67th Electronic Components and Technology Conference (ECTC).

[96]  Huaxi Gu,et al.  Design of 3D Optical Network on Chip , 2009, 2009 Symposium on Photonics and Optoelectronics.

[97]  Sung Kyu Lim,et al.  Design challenges and solutions for ultra-high-density monolithic 3D ICs , 2014 .

[98]  Liqiang Cao,et al.  Thermal management of 3D stacked dies with air convection and water cooling methods , 2015, 2015 IEEE 65th Electronic Components and Technology Conference (ECTC).

[99]  GABRIEL H. LOH,et al.  3D Stacked Microprocessor: Are We There Yet? , 2010, IEEE Micro.

[100]  Jeffrey S. Vetter,et al.  A Survey Of Techniques for Architecting DRAM Caches , 2016, IEEE Transactions on Parallel and Distributed Systems.

[101]  Kiyoung Choi,et al.  3D network-on-chip with wireless links through inductive coupling , 2011, 2011 International SoC Design Conference.

[102]  Tadahiro Kuroda,et al.  Wideband Inductive-coupling Interface for High-performance Portable System , 2007, 2007 IEEE Custom Integrated Circuits Conference.

[103]  K.F. Yang,et al.  TSV process optimization for reduced device impact on 28nm CMOS , 2011, 2011 Symposium on VLSI Technology - Digest of Technical Papers.

[104]  Ahmed Louri,et al.  3D-NoC: Reconfigurable 3D photonic on-chip interconnect for multicores , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[105]  Thomas Willhalm,et al.  Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads , 2015, 2015 IEEE International Symposium on Workload Characterization.

[106]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[107]  Fengjuan Wang,et al.  An Effective Approach of Reducing the Keep-Out-Zone Induced by Coaxial Through-Silicon-Via , 2014, IEEE Transactions on Electron Devices.

[108]  Jongmoo Choi,et al.  Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[109]  Eric Beyne,et al.  The 3-D Interconnect Technology Landscape , 2016, IEEE Design & Test.

[110]  G. Lo,et al.  Novel integration technique for silicon/III-V hybrid laser. , 2014, Optics express.

[111]  Roberto Guerrieri,et al.  3-D Capacitive Interconnections for Wafer-Level and Die-Level Assembly , 2007, IEEE Journal of Solid-State Circuits.

[112]  G. Beyer,et al.  Small Pitch, High Aspect Ratio Via-Last TSV Module , 2016, 2016 IEEE 66th Electronic Components and Technology Conference (ECTC).

[113]  Lei Guo,et al.  Design and OPNET implementation of routing algorithm in 3D optical network on chip , 2014, 2014 IEEE/CIC International Conference on Communications in China (ICCC).