SecNDP: Secure Near-Data Processing with Untrusted Memory

Today’s data-intensive applications increasingly suffer from significant performance bottlenecks due to the limited memory bandwidth of the classical von Neumann architecture. Near-Data Processing (NDP) has been proposed to perform computation near memory or data storage to reduce data movement for improving performance and energy consumption. However, the untrusted NDP processing units (PUs) bring in new threats to workloads that are private and sensitive, such as private database queries and private machine learning inferences. Meanwhile, most existing secure hardware designs do not consider off-chip components trustworthy. Once data leaving the processor, they must be protected, e.g., via block cipher encryption. Unfortunately, current encryption schemes do not support computation over encrypted data stored in memory or storage, hindering the adoption of NDP techniques for sensitive workloads. In this paper, we propose SecNDP, a lightweight encryption and verification scheme for untrusted NDP devices to perform computation over ciphertext and verify the correctness of linear operations. Our encryption scheme leverages arithmetic secret sharing in secure Multi-Party Computation (MPC) to support operations over ciphertext, and uses counter-mode encryption to reduce the decryption latency. The security of the scheme is formally proven. Compared with a non-NDP baseline, secure computation with SecNDP significantly reduces the memory bandwidth usage while providing security guarantees. We evaluate SecNDP for two workloads of distinct memory access patterns. In the setting of eight NDP units, we show a speedup up to 7.46× and energy savings of 18% over an unprotected non-NDP baseline, approaching the performance gain attained by native NDP without protection. Furthermore, SecNDP does not require any security assumption on NDP to hold, thus, using the same threat model as existing secure processors. SecNDP can be implemented without changing the NDP protocols and their inherent hardware design.

[1]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Daniel J. Bernstein,et al.  FLOATING-POINT ARITHMETIC AND MESSAGE AUTHENTICATION , 2000 .

[3]  Mel Gorman,et al.  Understanding the Linux Virtual Memory Manager , 2004 .

[4]  Marten van Dijk,et al.  AEGIS: architecture for tamper-evident and tamper-resistant processing , 2003, ICS '03.

[5]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[6]  Sander Stuijk,et al.  Near-Memory Computing: Past, Present, and Future , 2019, Microprocess. Microsystems.

[7]  Benjamin C. Lee,et al.  PoisonIvy: Safe speculation for secure memory , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Hugo Krawczyk,et al.  MMH: Software Message Authentication in the Gbit/Second Rates , 1997, FSE.

[9]  O Seongil,et al.  Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[10]  Jung Ho Ahn,et al.  Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[12]  Dan Boneh,et al.  Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware , 2018, ICLR.

[13]  Carole-Jean Wu,et al.  The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[14]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[15]  MutluOnur,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015 .

[16]  Carole-Jean Wu,et al.  RecSSD: near data processing for solid state drive based recommendation inference , 2021, ASPLOS.

[17]  Phillip Rogaway,et al.  Authenticated-encryption with associated-data , 2002, CCS '02.

[18]  Sander Stuijk,et al.  NARMADA: Near-Memory Horizontal Diffusion Accelerator for Scalable Stencil Computations , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[19]  Chanik Park,et al.  Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[20]  John Viega,et al.  The Security and Performance of the Galois/Counter Mode (GCM) of Operation , 2004, INDOCRYPT.

[21]  Mohit Tiwari,et al.  SESAME: Software defined Enclaves to Secure Inference Accelerators with Multi-tenant Execution , 2020, ArXiv.

[22]  Marten van Dijk,et al.  Efficient memory integrity verification and encryption for secure processors , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[23]  Yinghai Lu,et al.  Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.

[24]  Shobha Venkataraman,et al.  CrypTen: Secure Multi-Party Computation Meets Machine Learning , 2021, NeurIPS.

[25]  Ariel J. Feldman,et al.  Lest we remember: cold-boot attacks on encryption keys , 2008, CACM.

[26]  Brian Rogers,et al.  Improving Cost, Performance, and Security of Memory Encryption and Authentication , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[27]  Ramyad Hadidi,et al.  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28]  Kees G. W. Goossens,et al.  Improved Power Modeling of DDR SDRAMs , 2011, 2011 14th Euromicro Conference on Digital System Design.

[29]  Kiyoung Choi,et al.  PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[30]  Yang Liu,et al.  Willow: A User-Programmable SSD , 2014, OSDI.

[31]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[32]  Zhiru Zhang,et al.  GuardNN: Secure DNN Accelerator for Privacy-Preserving Deep Learning , 2020, ArXiv.

[33]  Jianyu Huang,et al.  Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale , 2021, IEEE Micro.

[34]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[35]  Yan Solihin,et al.  ObfusMem: A low-overhead access obfuscation for trusted memories , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[36]  Minsoo Rhu,et al.  TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.

[37]  Andrew B. Kahng,et al.  CACTI-IO: CACTI with off-chip power-area-timing models , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[38]  H.-S. Philip Wong,et al.  Resistive RAM-Centric Computing: Design and Modeling Methodology , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[39]  Dong Li,et al.  Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Mark Tygert,et al.  Secure multiparty computations in floating-point arithmetic , 2020, Information and Inference: A Journal of the IMA.

[41]  Vinod Vaikuntanathan,et al.  On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption , 2012, STOC '12.

[42]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[43]  Xuan-Tu Tran,et al.  A 45nm High-Throughput and Low Latency AES Encryption for Real-Time Applications , 2019, 2019 19th International Symposium on Communications and Information Technologies (ISCIT).

[44]  Meng-Fan Chang,et al.  33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[45]  Peter Rindal,et al.  ABY3: A Mixed Protocol Framework for Machine Learning , 2018, IACR Cryptol. ePrint Arch..

[46]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[47]  Martin D. Schatz,et al.  RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[48]  Hui Zhang,et al.  SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD , 2020, IEEE Computer Architecture Letters.

[49]  Carsten Lund,et al.  Efficient probabilistically checkable proofs and applications to approximations , 1993, STOC.

[50]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[51]  Rosario Cammarota,et al.  Machine Learning IP Protection , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[52]  Amrita Mazumdar,et al.  Application Codesign of Near-Data Processing for Similarity Search , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[53]  Yongqin Wang,et al.  DarKnight: A Data Privacy Scheme for Training and Inference of Deep Neural Networks , 2020, ArXiv.

[54]  Craig Gentry,et al.  Non-interactive Verifiable Computing: Outsourcing Computation to Untrusted Workers , 2010, CRYPTO.

[55]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[56]  Masahiro Yagisawa,et al.  Fully Homomorphic Encryption without bootstrapping , 2015, IACR Cryptol. ePrint Arch..

[57]  Srinivas Devadas,et al.  MI6: Secure Enclaves in a Speculative Out-of-Order Processor , 2018, MICRO.

[58]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[59]  Srinath T. V. Setty,et al.  Making argument systems for outsourced computation practical (sometimes) , 2012, NDSS.

[60]  Hsien-Hsin S. Lee,et al.  High efficiency counter mode security architecture via prediction and precomputation , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[61]  Jung Ho Ahn,et al.  NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[62]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[63]  Jaehyuk Huh,et al.  Common Counters: Compressed Encryption Counters for Secure GPU Memory , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[64]  Tadayoshi Kohno,et al.  CWC: A High-Performance Conventional Authenticated Encryption Mode , 2004, FSE.

[65]  Rodrigo Bruno,et al.  Graviton: Trusted Execution Environments on GPUs , 2018, OSDI.

[66]  Satish Narayanasamy,et al.  InvisiMem: Smart memory defenses for memory bus side channel , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[67]  O Seongil,et al.  25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[68]  Xiao Yu,et al.  Vessels: efficient and scalable deep learning prediction on trusted processors , 2020, SoCC.

[69]  Gururaj Saileshwar,et al.  SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[70]  Sung Kyu Lim,et al.  FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[71]  Brian Rogers,et al.  Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[72]  Hsien-Hsin S. Lee,et al.  Cheetah: Optimizing and Accelerating Homomorphic Encryption for Private Inference , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).