PrivPy: General and Scalable Privacy-Preserving Data Mining

Privacy is a big hurdle for collaborative data mining across multiple parties. We present multi-party computation (MPC) framework designed for large-scale data mining tasks. PrivPy combines an easy-to-use and highly flexible Python programming interface with state-of-the-art secret-sharing-based MPC backend. With essential data types and operations (such as NumPy arrays and broadcasting), as well as automatic code-rewriting, programmers can write modern data mining algorithms conveniently in familiar Python. We demonstrate that we can support many real-world machine learning algorithms (e.g. logistic regression and convolutional neural networks) and large datasets (e.g. 5000-by-1-million matrix) with minimal algorithm porting effort.

[1]  Octavian Catrina,et al.  Improved Primitives for Secure Multiparty Integer Computation , 2010, SCN.

[2]  Octavian Catrina,et al.  Secure Computation with Fixed-Point Numbers , 2010, Financial Cryptography.

[3]  Ahmad-Reza Sadeghi,et al.  TASTY: tool for automating secure two-party computations , 2010, CCS '10.

[4]  Abhi Shelat,et al.  Billion-Gate Secure Computation with Malicious Adversaries , 2012, USENIX Security Symposium.

[5]  Wayne Luk,et al.  Pipeline vectorization , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Alex J. Malozemoff,et al.  Efficiently Enforcing Input Validity in Secure Two-party Computation , 2016, IACR Cryptol. ePrint Arch..

[7]  Jun Sakuma,et al.  Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data , 2016, NDSS.

[8]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[9]  Dan Bogdanov,et al.  Sharemind: A Framework for Fast Privacy-Preserving Computations , 2008, ESORICS.

[10]  Yihua Zhang,et al.  PICCO: a general-purpose compiler for private distributed computation , 2013, CCS.

[11]  Peeter Laud,et al.  Combining Differential Privacy and Secure Multiparty Computation , 2015, ACSAC.

[12]  Tjalling J. Ypma,et al.  Historical Development of the Newton-Raphson Method , 1995, SIAM Rev..

[13]  Roksana Boreli,et al.  Applying Differential Privacy to Matrix Factorization , 2015, RecSys.

[14]  Yehuda Lindell,et al.  High-Throughput Semi-Honest Secure Three-Party Computation with an Honest Majority , 2016, IACR Cryptol. ePrint Arch..

[15]  Yann LeCun,et al.  Deep learning & convolutional networks , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[16]  Ahmad-Reza Sadeghi,et al.  Automated Synthesis of Optimized Circuits for Secure Computation , 2015, CCS.

[17]  Michael Zohner,et al.  ABY - A Framework for Efficient Mixed-Protocol Secure Two-Party Computation , 2015, NDSS.

[18]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[19]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Wei Xu,et al.  PEM: A Practical Differentially Private System for Large-Scale Cross-Institutional Data Mining , 2017, ECML/PKDD.

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[24]  David Evans,et al.  Obliv-C: A Language for Extensible Data-Oblivious Computation , 2015, IACR Cryptol. ePrint Arch..

[25]  Yehuda Lindell,et al.  Generalizing the SPDZ Compiler For Other Protocols , 2018, IACR Cryptol. ePrint Arch..

[26]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[27]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[28]  Jeffrey S. Foster,et al.  Understanding source code evolution using abstract syntax tree matching , 2005, MSR.

[29]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[30]  Marcel Keller,et al.  Practical Covertly Secure MPC for Dishonest Majority - Or: Breaking the SPDZ Limits , 2013, ESORICS.

[31]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[32]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[33]  Florian Kerschbaum,et al.  L1 - An Intermediate Language for Mixed-Protocol Secure Computation , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference.

[34]  Jason Ko,et al.  Handwritten Digits Recognition , 2008 .

[35]  Jan Willemson,et al.  Hybrid Model of Fixed and Floating Point Numbers in Secure Multiparty Computations , 2014, ISC.

[36]  Yao Lu,et al.  Oblivious Neural Network Predictions via MiniONN Transformations , 2017, IACR Cryptol. ePrint Arch..

[37]  Yitao Duan,et al.  P4P: Practical Large-Scale Privacy-Preserving Distributed Computation Robust against Malicious Users , 2010, USENIX Security Symposium.

[38]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[39]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[40]  Benny Pinkas,et al.  FairplayMP: a system for secure multi-party computation , 2008, CCS.

[41]  Dan Bogdanov,et al.  Domain-Polymorphic Programming of Privacy-Preserving Applications , 2014, PLAS@ECOOP.

[42]  Kartik Nayak,et al.  ObliVM: A Programming Framework for Secure Computation , 2015, 2015 IEEE Symposium on Security and Privacy.

[43]  Ye Zhang,et al.  Fast and Secure Three-party Computation: The Garbled Circuit Approach , 2015, IACR Cryptol. ePrint Arch..

[44]  Yunfeng Wang,et al.  Privacy Preserving Data Mining Research: Current Status and Key Issues , 2007, International Conference on Computational Science.

[45]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[46]  A. Azzouz 2011 , 2020, City.

[47]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[48]  Jan Willemson,et al.  Secure floating point arithmetic and private satellite collision analysis , 2015, International Journal of Information Security.

[49]  Peter Rindal,et al.  ABY3: A Mixed Protocol Framework for Machine Learning , 2018, IACR Cryptol. ePrint Arch..