论文信息 - CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware

CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware

Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptosystems insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public key cryptographic protocol that could replace standardized public key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical tool in LBC is the Number Theoretic Transform (NTT), a common method to compute polynomial multiplication. It is the most compute-intensive routine and requires acceleration for practical deployment of LBC protocols. In this paper, we propose CryptoPIM, a high-throughput Processing In-Memory (PIM) accelerator for NTT-based polynomial multiplier with the support of polynomials with degrees up to 32k. Compared to the fastest FPGA implementation of an NTT-based multiplier, CryptoPIM achieves on average 31x throughput improvement with the same energy and only 28% performance reduction, thereby showing promise for practical deployment of LBC.

[1] Eby G. Friedman,et al. VTEAM – A General Model for Voltage Controlled Memristors , 2014 .

[2] Ameer Haj-Ali,et al. IMAGING: In-Memory AlGorithms for Image processiNG , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3] Michael Naehrig,et al. Accelerating Homomorphic Evaluation on Reconfigurable Hardware , 2015, CHES.

[4] Ameer Haj-Ali,et al. mMPU—A Real Processing-in-Memory Architecture to Combat the von Neumann Bottleneck , 2020 .

[5] Feifei Li,et al. NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[6] Hao Chen,et al. Simple Encrypted Arithmetic Library - SEAL v2.1 , 2016, Financial Cryptography Workshops.

[7] Vytautas Štuikys,et al. Energy Efficiency Comparison with Cipher Strength of AES and Rijndael Cryptographic Algorithms in Mobile Devices , 2011 .

[8] Nikil Dutt,et al. Synthesis of Flexible Accelerators for Early Adoption of Ring-LWE Post-quantum Cryptography , 2020, ACM Trans. Embed. Comput. Syst..

[9] Yuan Xie,et al. Emerging Memory Technologies: Design, Architecture, and Applications , 2013 .

[10] Kurt Rohloff,et al. An FPGA co-processor implementation of Homomorphic Encryption , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[11] Byung Chul Jang,et al. Memristive Logic‐in‐Memory Integrated Circuits for Energy‐Efficient Flexible Electronics , 2018 .

[12] Craig Gentry,et al. Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[13] Sied Ali Ansarmohammadi,et al. Fast and area efficient implementation for chaotic image encryption algorithms , 2015, 2015 18th CSI International Symposium on Computer Architecture and Digital Systems (CADS).

[14] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[15] Paul Barrett,et al. Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor , 1986, CRYPTO.

[16] Tajana Simunic,et al. FELIX: Fast and Energy-Efficient Logic in Memory , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17] Cong Xu,et al. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[18] Tajana Simunic,et al. Exploring Processing In-Memory for Different Technologies , 2019, ACM Great Lakes Symposium on VLSI.

[19] Anantha P. Chandrakasan,et al. Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols , 2019, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[20] Nikil D. Dutt,et al. Domain-specific Accelerators for Ideal Lattice-based Public Key Protocols , 2018, IACR Cryptol. ePrint Arch..

[21] Nishil Talati,et al. Logic Design Within Memristive Memories Using Memristor-Aided loGIC (MAGIC) , 2016, IEEE Transactions on Nanotechnology.

[22] Dominique Lavenier,et al. DNA mapping using Processor-in-Memory architecture , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[23] Nikil Dutt,et al. Flexible NTT Accelerators for RLWE Lattice-Based Cryptography , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[24] Luis Ceze,et al. NCAM: Near-Data Processing for Nearest Neighbor Search , 2015, MEMSYS.

[25] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[26] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[27] P. L. Montgomery. Modular multiplication without trial division , 1985 .

[28] Nikil Dutt,et al. Post-quantum Lattice-based Cryptography Implementations: A Survey , 2019 .

[29] Chris Peikert,et al. On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[30] Uri C. Weiser,et al. MAGIC—Memristor-Aided Logic , 2014, IEEE Transactions on Circuits and Systems II: Express Briefs.

[31] Nikil Dutt,et al. Exploring Energy Efficient Quantum-resistant Signal Processing Using Array Processors , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32] W. M. Gentleman,et al. Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).

[33] Craig Costello,et al. Frodo: Take off the Ring! Practical, Quantum-Secure Key Exchange from LWE , 2016, IACR Cryptol. ePrint Arch..

[34] Peter W. Shor,et al. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer , 1995, SIAM Rev..

[35] Scott A. Mahlke,et al. Duality Cache for Data Parallel Acceleration , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[36] Tajana Simunic,et al. FloatPIM: In-Memory Acceleration of Deep Neural Network Training with High Precision , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).