Liquid Silicon: A Nonvolatile Fully Programmable Processing-in-Memory Processor With Monolithically Integrated ReRAM

The slowdown of the CMOS technology scaling, and the trade-off between efficiency and flexibility have fueled the exploration into novel architectures with emerging post-CMOS technology [e.g., resistive-RAM (RRAM)]. In this article, a nonvolatile fully programmable processing-in-memory (PIM) processor named Liquid Silicon is demonstrated, which combines the superior programmability of general-purpose computing devices [e.g., field-programmable gate array (FPGA)] and the high efficiency of domain-specific accelerators. Besides the general computing applications, Liquid Silicon is particularly well suited for artificial intelligence (AI)/machine learning and big data applications, which not only poses high computational/memory demand but also evolves rapidly. To fabricate the Liquid Silicon chip, the HfO<sub>2</sub> RRAM is monolithically integrated on top of the commercial 130 nm CMOS. Our measurement confirms that Liquid Silicon chip can operate reliably at a low voltage of 650 mV. It achieves 60.9 TOPS/W in performing neural network (NN) inferences, and 480 GOPS/W in performing content-based similarity search (a key big data application) at a nominal voltage supply of 1.2 V, showing <inline-formula> <tex-math notation="LaTeX">$3\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$100\times $ </tex-math></inline-formula> improvement over the state-of-the-art domain-specific CMOS-/RRAM-based accelerators. In addition, it outperforms the latest nonvolatile FPGA in energy efficiency by <inline-formula> <tex-math notation="LaTeX">$3\times $ </tex-math></inline-formula> in general computing applications.

[1]  Peilin Song,et al.  1Mb 0.41 µm2 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing , 2013, 2013 Symposium on VLSI Circuits.

[2]  Ashish Goel,et al.  Similarity search and locality sensitive hashing using ternary content addressable memories , 2010, SIGMOD Conference.

[3]  Jung Ho Ahn,et al.  DRAMA: An Architecture for Accelerated Processing Near Memory , 2015, IEEE Computer Architecture Letters.

[4]  Yu Wang,et al.  4.7 A 65nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[5]  Heng-Yuan Lee,et al.  Data retention statistics and modelling in HfO2 resistive switching memories , 2015, 2015 IEEE International Reliability Physics Symposium.

[6]  Laurent Grenouillet,et al.  Resistive RAM Endurance: Array-Level Characterization and Correction Techniques Targeting Deep Learning Applications , 2019, IEEE Transactions on Electron Devices.

[7]  Daisuke Suzuki,et al.  Fabrication of a 3000-6-input-LUTs embedded and block-level power-gated nonvolatile FPGA chip using p-MTJ-based logic-in-memory structure , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[8]  Ken Takeuchi,et al.  A 6.8 TOPS/W Energy Efficiency, 1.5µW Power Consumption, Pulse Width Modulation Neuromorphic Circuits for Near-Data Computing with SSD , 2018, 2018 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[9]  Shouyi Yin,et al.  An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary / ternary weights in 28 nm CMOS , 2018 .

[10]  Leibo Liu,et al.  An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS , 2018, 2018 IEEE Symposium on VLSI Circuits.

[11]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[12]  G. Cibrario,et al.  Fundamental variability limits of filament-based RRAM , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[13]  Tadahiro Kuroda,et al.  BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS , 2017, 2017 Symposium on VLSI Circuits.

[14]  Jing Li,et al.  Liquid Silicon: A Data-Centric Reconfigurable Architecture Enabled by RRAM Technology , 2018, FPGA.

[15]  Meng-Fan Chang,et al.  A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[16]  Jing Li,et al.  Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support , 2018, ASPLOS.

[17]  Jing Li,et al.  1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing , 2014, IEEE Journal of Solid-State Circuits.

[18]  S. Yang,et al.  Logic Synthesis and Optimization Benchmarks User Guide Version 3.0 , 1991 .

[19]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[20]  Meng-Fan Chang,et al.  A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory , 2017, 2017 Symposium on VLSI Technology.

[21]  Leibo Liu,et al.  A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS , 2018, 2018 IEEE Symposium on VLSI Circuits.

[22]  Jing Li,et al.  Reconfigurable in-memory computing with resistive memory crossbar , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[23]  Abbas El Gamal,et al.  Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory , 2012, 2012 IEEE International Solid-State Circuits Conference.