CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware

Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptosystems insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public key cryptographic protocol that could replace standardized public key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical tool in LBC is the Number Theoretic Transform (NTT), a common method to compute polynomial multiplication. It is the most compute-intensive routine and requires acceleration for practical deployment of LBC protocols. In this paper, we propose CryptoPIM, a high-throughput Processing In-Memory (PIM) accelerator for NTT-based polynomial multiplier with the support of polynomials with degrees up to 32k. Compared to the fastest FPGA implementation of an NTT-based multiplier, CryptoPIM achieves on average 31x throughput improvement with the same energy and only 28% performance reduction, thereby showing promise for practical deployment of LBC.

[1]  Eby G. Friedman,et al.  VTEAM – A General Model for Voltage Controlled Memristors , 2014 .

[2]  Ameer Haj-Ali,et al.  IMAGING: In-Memory AlGorithms for Image processiNG , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  Michael Naehrig,et al.  Accelerating Homomorphic Evaluation on Reconfigurable Hardware , 2015, CHES.

[4]  Ameer Haj-Ali,et al.  mMPU—A Real Processing-in-Memory Architecture to Combat the von Neumann Bottleneck , 2020 .

[5]  Feifei Li,et al.  NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[6]  Hao Chen,et al.  Simple Encrypted Arithmetic Library - SEAL v2.1 , 2016, Financial Cryptography Workshops.

[7]  Vytautas Štuikys,et al.  Energy Efficiency Comparison with Cipher Strength of AES and Rijndael Cryptographic Algorithms in Mobile Devices , 2011 .

[8]  Nikil Dutt,et al.  Synthesis of Flexible Accelerators for Early Adoption of Ring-LWE Post-quantum Cryptography , 2020, ACM Trans. Embed. Comput. Syst..

[9]  Yuan Xie,et al.  Emerging Memory Technologies: Design, Architecture, and Applications , 2013 .

[10]  Kurt Rohloff,et al.  An FPGA co-processor implementation of Homomorphic Encryption , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[11]  Byung Chul Jang,et al.  Memristive Logic‐in‐Memory Integrated Circuits for Energy‐Efficient Flexible Electronics , 2018 .

[12]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[13]  Sied Ali Ansarmohammadi,et al.  Fast and area efficient implementation for chaotic image encryption algorithms , 2015, 2015 18th CSI International Symposium on Computer Architecture and Digital Systems (CADS).

[14]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[15]  Paul Barrett,et al.  Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor , 1986, CRYPTO.

[16]  Tajana Simunic,et al.  FELIX: Fast and Energy-Efficient Logic in Memory , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Tajana Simunic,et al.  Exploring Processing In-Memory for Different Technologies , 2019, ACM Great Lakes Symposium on VLSI.

[19]  Anantha P. Chandrakasan,et al.  Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols , 2019, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[20]  Nikil D. Dutt,et al.  Domain-specific Accelerators for Ideal Lattice-based Public Key Protocols , 2018, IACR Cryptol. ePrint Arch..

[21]  Nishil Talati,et al.  Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.

[22]  Dominique Lavenier,et al.  DNA mapping using Processor-in-Memory architecture , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[23]  Nikil Dutt,et al.  Flexible NTT Accelerators for RLWE Lattice-Based Cryptography , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[24]  Luis Ceze,et al.  NCAM: Near-Data Processing for Nearest Neighbor Search , 2015, MEMSYS.

[25]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[26]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[27]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[28]  Nikil Dutt,et al.  Post-quantum Lattice-based Cryptography Implementations: A Survey , 2019 .

[29]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[30]  Uri C. Weiser,et al.  MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[31]  Nikil Dutt,et al.  Exploring Energy Efficient Quantum-resistant Signal Processing Using Array Processors , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  W. M. Gentleman,et al.  Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).

[33]  Craig Costello,et al.  Frodo: Take off the Ring! Practical, Quantum-Secure Key Exchange from LWE , 2016, IACR Cryptol. ePrint Arch..

[34]  Peter W. Shor,et al.  Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer , 1995, SIAM Rev..

[35]  Scott A. Mahlke,et al.  Duality Cache for Data Parallel Acceleration , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[36]  Tajana Simunic,et al.  FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).