NVM-Based FPGA Block RAM With Adaptive SLC-MLC Conversion

The capacity of SRAM-based FPGA block RAM (BRAM) is restrained by the low density and high leakage power of the current CMOS technology. In this paper, we propose a nonvolatile memory (NVM)-based BRAM architecture which enables flexible conversions between single-level cell (SLC) and multilevel cell (MLC) states. We show that despite the high per-access latency and power consumption, MLC-based BRAM blocks reduce the routing cost between logic units and on-chip data storages, which potentially leads to a smaller critical path delay and power consumption. Therefore, we propose an NVM BRAM architecture and an EDA framework which adaptively packs data into SLC- or MLC-state BRAMs during FPGA design flow in order to achieve better system performance. This paper illustrates that a simple memory device replacement from SRAM to NVM leads to nonoptimal system performance. On the other hand, compared with operating all NVM BRAM blocks in the SLC state with better per-access latency and power consumption, the proposed hybrid SLC-MLC architecture and design flow improves the critical path delay by 18.51%, with a system power reduction of 25.83% at the same time. Moreover, compared with the traditional “fast” SRAM-based BRAM blocks under the same BRAM area constraint, our hybrid NVM BRAM architecture improves the critical path delay by 8.55% on average, with an average system power reduction of 54.34% at the same time.

[1]  Zdenek Pliva,et al.  On utilization of BRAM in FPGA for advanced measurements in mechatronics , 2015, 2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM).

[2]  Cong Xu,et al.  Adaptive placement and migration policy for an STT-RAM-based hybrid cache , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[3]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[4]  Jun Wang,et al.  AOS: Adaptive overwrite scheme for energy-efficient MLC STT-RAM cache , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Jingtong Hu,et al.  Design Exploration for Multiple Level Cell Based Non-Volatile FPGAs , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[6]  H. Ohno,et al.  A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions , 2010, 2010 Symposium on VLSI Technology.

[7]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Weng-Fai Wong,et al.  Optimizing MLC-based STT-RAM caches by dynamic block size reconfiguration , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[9]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Sen Wang,et al.  VTR 7.0: Next Generation Architecture and CAD System for FPGAs , 2014, TRETS.

[11]  Ali Jahanian,et al.  A fast placement algorithm for embedded just-in-time reconfigurable extensible processing platform , 2014, The Journal of Supercomputing.

[12]  Jun Yang,et al.  Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors , 2012, DAC Design Automation Conference 2012.

[13]  Masahiro Iida,et al.  Architecture exploration of 3D FPGA to minimize internal layer connection , 2015, 2015 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC).

[14]  Qiao Li,et al.  Energy, latency, and lifetime improvements in MLC NVM with enhanced WOM code , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[15]  Jason Cong,et al.  FPGA-RPI: A Novel FPGA Architecture With RRAM-Based Programmable Interconnects , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Tao Li,et al.  Power-performance co-optimization of throughput core architecture using resistive memory , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[17]  Chita R. Das,et al.  OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Yuan Xie,et al.  3D-NonFAR: Three-dimensional non-volatile FPGA architecture using phase change memory , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[19]  S Mukhopadhyay,et al.  A Circuit and Architecture Codesign Approach for a Hybrid CMOS–STTRAM Nonvolatile FPGA , 2011, IEEE Transactions on Nanotechnology.

[20]  Sylvain Guilley,et al.  Exploiting FPGA block memories for protected cryptographic implementations , 2013, ReCoSoC.

[21]  Kartik Mohanram,et al.  CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM , 2016, HPCA.

[22]  Jason Cong,et al.  mrFPGA: A novel FPGA architecture with memristor-based reconfiguration , 2011, 2011 IEEE/ACM International Symposium on Nanoscale Architectures.

[23]  Abbas El Gamal,et al.  Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory , 2012, 2012 IEEE International Solid-State Circuits Conference.

[24]  Vaughn Betz,et al.  Enhancing FPGAs with Magnetic Tunnel Junction-Based Block RAMs , 2018, ACM Trans. Reconfigurable Technol. Syst..

[25]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[26]  Jason Cong,et al.  FPGA-based accelerator for long short-term memory recurrent neural networks , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[27]  Ang Li,et al.  Nonvolatile memory allocation and hierarchy optimization for high-level synthesis , 2015, The 20th Asia and South Pacific Design Automation Conference.

[28]  Jason Cong,et al.  Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[29]  Yuan Xie,et al.  A Study on Practically Unlimited Endurance of STT-MRAM , 2017, IEEE Transactions on Electron Devices.

[30]  Amin Jadidi,et al.  Performance and Power-Efficient Design of Dense Non-Volatile Cache in CMPs , 2018, IEEE Transactions on Computers.

[31]  Jingtong Hu,et al.  Routing path reuse maximization for efficient NV-FPGA reconfiguration , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[32]  Yu Wang,et al.  A STT-RAM-based low-power hybrid register file for GPGPUs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[33]  Yajun Ha,et al.  A Low Active Leakage and High Reliability Phase Change Memory (PCM) Based Non-Volatile FPGA Storage Element , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[34]  Shengen Yan,et al.  Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[35]  John Shalf,et al.  OpenNVM: An open-sourced FPGA-based NVM controller for low level memory characterization , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[36]  Lionel Torres,et al.  Trends on the application of emerging nonvolatile memory to processors and programmable devices , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[37]  Yun Liang,et al.  Design Space exploration of FPGA-based accelerators with multi-level parallelism , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[38]  Vaughn Betz,et al.  High density, low energy, magnetic tunnel junction based block RAMs for memory-rich FPGAs , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[39]  John Shalf,et al.  Integrating 3D Resistive Memory Cache into GPGPU for Energy-Efficient Data Processing , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[40]  Shimeng Yu,et al.  Emerging Memory Technologies: Recent Trends and Prospects , 2016, IEEE Solid-State Circuits Magazine.

[41]  Vaughn Betz,et al.  Don't Forget the Memory: Automatic Block RAM Modelling, Optimization, and Architecture Exploration , 2017, FPGA.

[42]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[43]  Jingtong Hu,et al.  Fine-tuning CLB placement to speed up reconfigurations in NVM-based FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).