Make Some ROOM for the Zeros: Data Sparsity in Secure Distributed Machine Learning

Exploiting data sparsity is crucial for the scalability of many data analysis tasks. However, while there is an increasing interest in efficient secure computation protocols for distributed machine learning, data sparsity has so far not been considered in a principled way in that setting. We propose sparse data structures together with their corresponding secure computation protocols to address common data analysis tasks while utilizing data sparsity. In particular, we define a Read-Only Oblivious Map primitive (ROOM) for accessing elements in sparse structures, and present several instantiations of this primitive with different trade-offs. Then, using ROOM as a building block, we propose protocols for basic linear algebra operations such as Gather, Scatter, and multiple variants of sparse matrix multiplication. Our protocols are easily composable by using secret sharing. We leverage this, at the highest level of abstraction, to build secure protocols for non-parametric models (k-nearest neighbors and naive Bayes classification) and parametric models (logistic regression) that enable secure analysis on high-dimensional datasets. The experimental evaluation of our protocol implementations demonstrates a manyfold improvement in the efficiency over state-of-the-art techniques across all applications. Our system is designed and built mirroring the modular architecture in scientific computing and machine learning frameworks, and inspired by the Sparse BLAS standard.

[1]  Allan Borodin,et al.  Fast Modular Transforms via Division , 1972, SWAT.

[2]  Iain S. Duff,et al.  An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum , 2002, TOMS.

[3]  Karthik Gururaj,et al.  GenomicsDB : Storing Genome Data as Sparse Columnar Arrays , 2017 .

[4]  Vladimir Kolesnikov,et al.  Improved Garbled Circuit: Free XOR Gates and Applications , 2008, ICALP.

[5]  Stratis Ioannidis,et al.  GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[6]  Jonathan Katz,et al.  Private Set Intersection: Are Garbled Circuits Better than Custom Protocols? , 2012, NDSS.

[7]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[8]  James Bennett,et al.  The Netflix Prize , 2007 .

[9]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[10]  Phillipp Schoppmann,et al.  Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue , 2020, Proc. Priv. Enhancing Technol..

[11]  Peeter Laud,et al.  Parallel Oblivious Array Access for Secure Multiparty Computation and Privacy-Preserving Minimum Spanning Trees , 2015, Proc. Priv. Enhancing Technol..

[12]  Srinath T. V. Setty,et al.  PIR with Compressed Queries and Amortized Query Processing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[13]  Yuval Ishai,et al.  Function Secret Sharing , 2015, EUROCRYPT.

[14]  Yao Lu,et al.  Oblivious Neural Network Predictions via MiniONN Transformations , 2017, IACR Cryptol. ePrint Arch..

[15]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[16]  Jingren Zhou Sort-Merge Join , 2009, Encyclopedia of Database Systems.

[17]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[18]  Elad Hoffer,et al.  Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.

[19]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[20]  Benny Pinkas,et al.  Scalable Private Set Intersection Based on OT Extension , 2018, IACR Cryptol. ePrint Arch..

[21]  Moni Naor,et al.  Private Information Retrieval by Keywords , 1998, IACR Cryptol. ePrint Arch..

[22]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[23]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[24]  Yuval Ishai,et al.  Protecting data privacy in private information retrieval schemes , 1998, STOC '98.

[25]  Andrew Chi-Chih Yao,et al.  How to Generate and Exchange Secrets (Extended Abstract) , 1986, FOCS.

[26]  Yehuda Lindell,et al.  A Proof of Security of Yao’s Protocol for Two-Party Computation , 2009, Journal of Cryptology.

[27]  Stratis Ioannidis,et al.  Privacy-preserving matrix factorization , 2013, CCS.

[28]  Claudio Orlandi,et al.  Combining Private Set-Intersection with Secure Two-Party Computation , 2018, IACR Cryptol. ePrint Arch..

[29]  Donald Beaver,et al.  Efficient Multiparty Protocols Using Circuit Randomization , 1991, CRYPTO.

[30]  Anantha Chandrakasan,et al.  Gazelle: A Low Latency Framework for Secure Neural Network Inference , 2018, IACR Cryptol. ePrint Arch..

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Hao Chen,et al.  Labeled PSI from Fully Homomorphic Encryption with Malicious Security , 2018, IACR Cryptol. ePrint Arch..

[33]  David Evans,et al.  Obliv-C: A Language for Extensible Data-Oblivious Computation , 2015, IACR Cryptol. ePrint Arch..

[34]  Phillipp Schoppmann,et al.  Private Nearest Neighbors Classification in Federated Databases , 2018, IACR Cryptol. ePrint Arch..