Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems

Data classification is a widely used data mining technique for big data analysis. By training massive data collected from the real world, data classification helps learners discover hidden data patterns. In addition to data training, given a trained model from collected data, a user can classify whether a new incoming data belongs to an existing class, or, multiple distributed entities may collaborate to test the similarity of their trained results. However, due to data locality and privacy concerns, it is infeasible for large-scale distributed systems to share each individual's datasets with each other for data similarity check. On the one hand, the trained model is an entity's private asset and may leak private information, which should be well protected from all other non-collaborative entities. On the other hand, the new incoming data may contain sensitive information which cannot be disclosed directly for classification. To address the above privacy issues, we propose a privacy-preserving data classification and similarity evaluation scheme for distributed systems. With our scheme, neither new arriving data nor trained models are directly revealed during the classification and similarity evaluation procedures. The proposed scheme can be applied to many fields using data classification and evaluation. Based on extensive real-world experiments, we have also evaluated the privacy preservation, feasibility, and efficiency of the proposed scheme.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Yuguang Fang,et al.  Privacy-Preserving Machine Learning Algorithms for Big Data Systems , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[3]  Mauro Barni,et al.  Oblivious Neural Network Computing via Homomorphic Encryption , 2007, EURASIP J. Inf. Secur..

[4]  Ming-Syan Chen,et al.  Releasing the SVM Classifier with Privacy-Preservation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[6]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Wen-Guey Tzeng,et al.  Efficient k-out-of-n Oblivious Transfer Schemes , 2005, J. Univers. Comput. Sci..

[8]  Muttukrishnan Rajarajan,et al.  Privacy-Preserving Multi-Class Support Vector Machine for Outsourcing the Data Classification in Cloud , 2014, IEEE Transactions on Dependable and Secure Computing.

[9]  Olvi L. Mangasarian,et al.  Privacy-Preserving Classification of Horizontally Partitioned Data via Random Kernels , 2008, DMIN.

[10]  Euhanna Ghadimi,et al.  Optimal Parameter Selection for the Alternating Direction Method of Multipliers (ADMM): Quadratic Problems , 2013, IEEE Transactions on Automatic Control.

[11]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[12]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[13]  Ramesh Govindan,et al.  Cloud-enabled privacy-preserving collaborative learning for mobile sensing , 2012, SenSys '12.

[14]  Neelamadhab Padhy,et al.  The Survey of Data Mining Applications And Feature Scope , 2012, ArXiv.

[15]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[16]  Georgios B. Giannakis,et al.  Consensus-based distributed linear support vector machines , 2010, IPSN '10.

[17]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[18]  Tamir Tassa,et al.  Oblivious evaluation of multivariate polynomials , 2013, J. Math. Cryptol..

[19]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[20]  Yuguang Fang,et al.  My Privacy My Decision: Control of Photo Sharing on Online Social Networks , 2017, IEEE Transactions on Dependable and Secure Computing.

[21]  Xiaohui Liang,et al.  A Secure Handshake Scheme with Symptoms-Matching for mHealthcare Social Network , 2011, Mob. Networks Appl..

[22]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[23]  Olvi L. Mangasarian Privacy-preserving linear programming , 2011, Optim. Lett..

[24]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[25]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[26]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[27]  Terrance E. Boult,et al.  Secure remote matching with privacy: Scrambled support vector vaulted verification (S2V3) , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[28]  Brian,et al.  VENETA: Serverless Friend-of-Friend Detection in Mobile Social Networking , 2008 .

[29]  Yuguang Fang,et al.  A Secure Collaborative Machine Learning Framework Based on Data Locality , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[30]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[31]  Guanhua Yan,et al.  Privacy-Preserving Profile Matching for Proximity-Based Mobile Social Networking , 2013, IEEE Journal on Selected Areas in Communications.