Look Ahead ORAM: Obfuscating Addresses in Recommendation Model Training

In the cloud computing era, data privacy is a critical concern. Memory accesses patterns can leak private information. This data leak is particularly challenging for deep learning recommendation models, where data associated with a user is used to train a model. Recommendation models use embedding tables to map categorical data (embedding table indices) to large vector space, which is easier for recommendation systems to learn. Categorical data is directly linked to a user’s private interaction with a social media platform, such as the news articles read, ads clicked. Thus, just knowing the embedding indices accessed can compromise users’ privacy. Oblivious RAM (ORAM) [4] and its enhancements [15] [18] are proposed solutions to prevent memory accesses patterns from leaking information. ORAM solutions hide access patterns by fetching multiple data blocks per each demand fetch and then shuffling the location of blocks after each access. In this paper, we propose a new PathORAM architecture designed to protect users’ input privacy when training recommendation models. Look Ahead ORAM exploits the fact that during training, embedding table indices that are going to be accessed in a future batch are known beforehand. Look Ahead ORAM preprocesses future training samples to identify indices that will co-occur and groups these accesses into a large superblock. Look Ahead ORAM performs the ”same-path” assignment by grouping multiple data blocks into superblocks. Accessing a superblock will require fewer fetched data blocks than accessing all data blocks without grouping them as superblocks. Effectively, Look Ahead ORAM reduces the number of reads/writes per access. Look Ahead ORAM also introduces a fat-tree structure for PathORAM, i.e. a tree with variable bucket size. Look Ahead ORAM achieves 2x speedup compared to PathORAM and reduces the bandwidth requirement by 3.15x while providing the same security as PathORAM.

[1]  Dik Lun Lee,et al.  Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba , 2018, KDD.

[2]  Yongqin Wang,et al.  Privacy-Preserving Inference in Machine Learning Services Using Trusted Execution Environments , 2019, ArXiv.

[3]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[4]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[5]  Maxim Naumov,et al.  On the Dimensionality of Embeddings for Sparse Features and Data , 2019, ArXiv.

[6]  Srinivas Devadas,et al.  Design space exploration and optimization of path oblivious RAM in secure processors , 2013, ISCA.

[7]  Yongqin Wang,et al.  DarKnight: A Data Privacy Scheme for Training and Inference of Deep Neural Networks , 2020, ArXiv.

[8]  Carlos V. Rozas,et al.  Intel® Software Guard Extensions: EPID Provisioning and Attestation Services , 2016 .

[9]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[10]  Yinghai Lu,et al.  Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.

[11]  Srinivas Devadas,et al.  PrORAM: Dynamic prefetcher for Oblivious RAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[12]  Elaine Shi,et al.  Path ORAM: an extremely simple oblivious RAM protocol , 2012, CCS.

[13]  Dan Boneh,et al.  Architectural support for copy and tamper resistant software , 2000, SIGP.

[14]  Rong Jin,et al.  Deep Learning at Alibaba , 2017, IJCAI.

[15]  Elaine Shi,et al.  Ring ORAM: Closing the Gap Between Small and Large Client Storage Oblivious RAM , 2014, IACR Cryptol. ePrint Arch..

[16]  Dongrui Fan,et al.  Streamline Ring ORAM Accesses through Spatial and Temporal Optimization , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).