Computing-in-Memory for Performance and Energy-Efficient Homomorphic Encryption

Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous research efforts, the practicality of HE schemes remains to be demonstrated. In this regard, the enormous size of ciphertexts involved in HE computations degrades computational efficiency. Near-memory processing (NMP) and computing-in-memory (CiM)—paradigms where computation is done within the memory boundaries—represent architectural solutions for reducing latency and energy associated with data transfers in data-intensive applications, such as HE. This article introduces CiM-HE, a CiM architecture that can support operations for the Brakerski/Fan–Vercauteren (B/FV) scheme, a somewhat HE scheme for general computation. CiM-HE hardware consists of customized peripherals, such as sense amplifiers, adders, bit shifters, and sequencing circuits. The peripherals are based on CMOS technology and could support computations with memory cells of different technologies. Circuit-level simulations are used to evaluate our CiM-HE framework assuming a 6T-SRAM memory. We compare our CiM-HE implementation against: 1) two optimized CPU HE implementations and 2) a field-programmable gate array (FPGA)-based HE accelerator implementation. Compared with a CPU solution, CiM-HE obtains speedups between <inline-formula> <tex-math notation="LaTeX">$4.6\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$9.1\times $ </tex-math></inline-formula> and energy savings between <inline-formula> <tex-math notation="LaTeX">$266.4\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$532.8\times $ </tex-math></inline-formula> for homomorphic multiplications (the most expensive HE operation). Also, a set of four end-to-end tasks, i.e., mean, variance, linear regression, and inference, are up to <inline-formula> <tex-math notation="LaTeX">$1.1\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$7.7\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$7.1\times $ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$7.5\times $ </tex-math></inline-formula> faster (and <inline-formula> <tex-math notation="LaTeX">$301.1\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$404.6\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$532.3\times $ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$532.8\times $ </tex-math></inline-formula> more energy efficient). Compared with CPU-based HE in previous work, CiM-HE obtains <inline-formula> <tex-math notation="LaTeX">$14.3\times $ </tex-math></inline-formula> speedup and <inline-formula> <tex-math notation="LaTeX">$> 2600\times $ </tex-math></inline-formula> energy savings. Finally, our design offers <inline-formula> <tex-math notation="LaTeX">$2.2\times $ </tex-math></inline-formula> speedup with <inline-formula> <tex-math notation="LaTeX">$88.1\times $ </tex-math></inline-formula> energy savings compared with a state-of-the-art FPGA-based accelerator.

[1]  Di Gao,et al.  Eva-CiM: A System-Level Performance and Energy Evaluation Framework for Computing-in-Memory Architectures , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Kiyoung Choi,et al.  PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[3]  Chris Peikert,et al.  Better Key Sizes (and Attacks) for LWE-Based Encryption , 2011, CT-RSA.

[4]  Michael T. Niemier,et al.  Ferroelectric FET Based In-Memory Computing for Few-Shot Learning , 2019, ACM Great Lakes Symposium on VLSI.

[5]  Xiaolin Cao,et al.  High-Speed Fully Homomorphic Encryption Over the Integers , 2014, Financial Cryptography Workshops.

[6]  Oded Regev,et al.  On lattices, learning with errors, random linear codes, and cryptography , 2005, STOC '05.

[7]  Pragya Kushwaha,et al.  BSIM-CMG: Standard FinFET compact model for advanced circuit design , 2015, ESSCIRC Conference 2015 - 41st European Solid-State Circuits Conference (ESSCIRC).

[8]  Damien Stehlé,et al.  Hardness of decision (R)LWE for any modulus , 2012, IACR Cryptol. ePrint Arch..

[9]  David Blaauw,et al.  Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[10]  Berk Sunar,et al.  Accelerating Somewhat Homomorphic Evaluation using FPGAs , 2015, IACR Cryptol. ePrint Arch..

[11]  Julien Eynard,et al.  A Full RNS Variant of FV Like Somewhat Homomorphic Encryption Schemes , 2016, SAC.

[12]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[13]  Michael Naehrig,et al.  Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme , 2013, IMACC.

[14]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[15]  MutluOnur,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015 .

[16]  Nishil Talati,et al.  Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.

[17]  Shai Halevi,et al.  Bootstrapping for HElib , 2015, EUROCRYPT.

[18]  Frederik Vercauteren,et al.  FPGA-Based High-Performance Parallel Architecture for Homomorphic Computing on Encrypted Data , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[19]  Tajana Simunic,et al.  FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[20]  Hao Chen,et al.  Simple Encrypted Arithmetic Library - SEAL v2.1 , 2016, Financial Cryptography Workshops.

[21]  Mayler G. A. Martins,et al.  Open Cell Library in 15nm FreePDK Technology , 2015, ISPD.

[22]  Léo Ducas,et al.  FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second , 2015, EUROCRYPT.

[23]  Wouter Castryck,et al.  Privacy-friendly Forecasting for the Smart Grid using Homomorphic Encryption and the Group Method of Data Handling , 2017, IACR Cryptol. ePrint Arch..

[24]  Michael Niemier,et al.  A Computing-in-Memory Engine for Searching on Homomorphically Encrypted Data , 2019, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[25]  Frederik Vercauteren,et al.  Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation , 2015, CHES.

[26]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[27]  Xinming Huang,et al.  VLSI Design of a Large-Number Multiplier for Fully Homomorphic Encryption , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[28]  Hao Chen,et al.  Simple Encrypted Arithmetic Library v2.3.0 , 2017 .

[29]  Yuangang Wang,et al.  AIM: Fast and energy-efficient AES in-memory implementation for emerging non-volatile main memory , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[30]  J. Thomas Pawlowski,et al.  Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[31]  Brent Waters,et al.  Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based , 2013, CRYPTO.

[32]  S. Halevi,et al.  Design and Implementation of a Homomorphic-Encryption Library , 2012 .

[33]  Yuan Xie,et al.  Near-Data Acceleration of Privacy-Preserving Biomarker Search with 3D-Stacked Memory , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[34]  Hao Yu,et al.  Energy efficient in-memory AES encryption based on nonvolatile domain-wall nanowire , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[35]  Wei Dai,et al.  HEAX: An Architecture for Computing on Encrypted Data , 2020, ASPLOS.

[36]  Tajana Rosing,et al.  NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration , 2019, IEEE Transactions on Computers.

[37]  Michael T. Niemier,et al.  Computing in memory with FeFETs , 2018, ISLPED.

[38]  Kurt Rohloff,et al.  Designing an FPGA-Accelerated Homomorphic Encryption Co-Processor , 2017, IEEE Transactions on Emerging Topics in Computing.

[39]  Damien Stehlé,et al.  Classical hardness of learning with errors , 2013, STOC '13.

[40]  Jung Hee Cheon,et al.  Bootstrapping for Approximate Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[41]  Anand Raghunathan,et al.  Computing in Memory With Spin-Transfer Torque Magnetic RAM , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[42]  Takashi Sato,et al.  SCAM: Secured content addressable memory based on homomorphic encryption , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[43]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[44]  Martin R. Albrecht,et al.  A Subfield Lattice Attack on Overstretched NTRU Assumptions - Cryptanalysis of Some FHE and Graded Encoding Schemes , 2016, CRYPTO.

[45]  Shekhar Borkar The Exascale challenge , 2010, Proceedings of 2010 International Symposium on VLSI Design, Automation and Test.

[46]  David Blaauw,et al.  A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.

[47]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[48]  Michael Naehrig,et al.  Accelerating Homomorphic Evaluation on Reconfigurable Hardware , 2015, CHES.

[49]  Alessandro Cilardo,et al.  Securing the cloud with reconfigurable computing: An FPGA accelerator for homomorphic encryption , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[50]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[51]  Kaushik Vaidyanathan Exploiting Challenges of Sub-20 nm CMOS for Affordable Technology Scaling , 2015, ArXiv.

[52]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[53]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[54]  Elaine B. Barker,et al.  Transitioning the use of cryptographic algorithms and key lengths , 2011 .

[55]  Chris Peikert,et al.  Public-key cryptosystems from the worst-case shortest vector problem: extended abstract , 2009, STOC '09.

[56]  Engin Ipek,et al.  Enabling Scientific Computing on Memristive Accelerators , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[57]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[58]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[59]  Giovanni Cesari,et al.  Performance Analysis of the Parallel Karatsuba Multiplication Algorithm for Distributed Memory Architectures , 1996, J. Symb. Comput..

[60]  Shai Halevi,et al.  An Improved RNS Variant of the BFV Homomorphic Encryption Scheme , 2019, IACR Cryptol. ePrint Arch..

[61]  William Rhett Davis,et al.  FreePDK15: An Open-Source Predictive Process Design Kit for 15nm FinFET Technology , 2015, ISPD.