Preserving Model Privacy for Machine Learning in Distributed Systems

Machine Learning based data classification is a widely used data mining technique. By learning massive data collected from the real world, data classification helps learners discover hidden data patterns. These hidden data patterns are represented by the learned model in different machine learning schemes. Based on such models, a user can classify whether the new incoming data belongs to an existing class; or, multiple entities may test the similarity of their datasets. However, due to data locality and privacy concerns, it is infeasible for large-scale distributed systems to share each individual’s datasets for classifying or testing. On the one hand, the learned model is an entity’s private asset and may leak private information, which should be well protected from all other non-collaborative entities. On the other hand, the new incoming data may contain sensitive information which cannot be disclosed directly for classification. To address the above privacy issues, we propose an approach to preserve the model privacy of the data classification and similarity evaluation for distributed systems. With our scheme, neither new data nor learned models are directly revealed during the classification and similarity evaluation procedures. Based on extensive real-world experiments, we have evaluated the privacy preservation, feasibility, and efficiency of the proposed scheme.

[1]  Wen-Guey Tzeng,et al.  Efficient k-out-of-n Oblivious Transfer Schemes , 2005, J. Univers. Comput. Sci..

[2]  Muttukrishnan Rajarajan,et al.  Privacy-Preserving Multi-Class Support Vector Machine for Outsourcing the Data Classification in Cloud , 2014, IEEE Transactions on Dependable and Secure Computing.

[3]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[4]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[5]  Fuzhen Zhuang,et al.  Collaborating between Local and Global Learning for Distributed Online Multiple Tasks , 2015, CIKM.

[6]  Olvi L. Mangasarian,et al.  Privacy-Preserving Classification of Horizontally Partitioned Data via Random Kernels , 2008, DMIN.

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Sinno Jialin Pan,et al.  Distributed Multi-Task Relationship Learning , 2017, KDD.

[9]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[10]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[11]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[12]  Shu Wang,et al.  Collaborative Deep Reinforcement Learning , 2017, ArXiv.

[13]  Yuguang Fang,et al.  PAAS: A Privacy-Preserving Attribute-Based Authentication System for eHealth Networks , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[14]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[15]  Yuguang Fang,et al.  Privacy-preserving attribute-based friend search in geosocial networks with untrusted servers , 2013, 2013 IEEE Global Communications Conference (GLOBECOM).

[16]  Jiayu Zhou,et al.  Privacy-Preserving Distributed Multi-Task Learning with Asynchronous Updates , 2017, KDD.

[17]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[18]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[19]  Jeffrey F. Naughton,et al.  A Methodology for Formalizing Model-Inversion Attacks , 2016, 2016 IEEE 29th Computer Security Foundations Symposium (CSF).

[20]  Tamir Tassa,et al.  Oblivious evaluation of multivariate polynomials , 2013, J. Math. Cryptol..

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Brian,et al.  VENETA: Serverless Friend-of-Friend Detection in Mobile Social Networking , 2008 .

[23]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[24]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[25]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[26]  Yuguang Fang,et al.  My Privacy My Decision: Control of Photo Sharing on Online Social Networks , 2017, IEEE Transactions on Dependable and Secure Computing.

[27]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[28]  Yuguang Fang,et al.  A Secure Collaborative Machine Learning Framework Based on Data Locality , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[29]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[30]  Guanhua Yan,et al.  Privacy-Preserving Profile Matching for Proximity-Based Mobile Social Networking , 2013, IEEE Journal on Selected Areas in Communications.

[31]  Terrance E. Boult,et al.  Secure remote matching with privacy: Scrambled support vector vaulted verification (S2V3) , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[32]  Euhanna Ghadimi,et al.  Optimal Parameter Selection for the Alternating Direction Method of Multipliers (ADMM): Quadratic Problems , 2013, IEEE Transactions on Automatic Control.

[33]  Svetha Venkatesh,et al.  Differentially Private Multi-task Learning , 2016, PAISI.

[34]  Yuguang Fang,et al.  Privacy-Preserving Machine Learning Algorithms for Big Data Systems , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[35]  Ming-Syan Chen,et al.  Releasing the SVM Classifier with Privacy-Preservation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[36]  Ramesh Govindan,et al.  Cloud-enabled privacy-preserving collaborative learning for mobile sensing , 2012, SenSys '12.

[37]  Xiaohui Liang,et al.  A Secure Handshake Scheme with Symptoms-Matching for mHealthcare Social Network , 2011, Mob. Networks Appl..

[38]  Yuguang Fang,et al.  User-centric private matching for eHealth networks - A social perspective , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[39]  Yuguang Fang,et al.  A Privacy-Preserving Attribute-Based Authentication System for Mobile Health Networks , 2014, IEEE Transactions on Mobile Computing.

[40]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[41]  Yuguang Fang,et al.  Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[42]  Olvi L. Mangasarian Privacy-preserving linear programming , 2011, Optim. Lett..