Privacy Preserving Distributed Machine Learning with Federated Learning

Edge computing and distributed machine learning have advanced to a level that can revolutionize a particular organization. Distributed devices such as the Internet of Things (IoT) often produce a large amount of data, eventually resulting in big data that can be vital in uncovering hidden patterns, and other insights in numerous fields such as healthcare, banking, and policing. Data related to areas such as healthcare and banking can contain potentially sensitive data that can become public if they are not appropriately sanitized. Federated learning (FedML) is a recently developed distributed machine learning (DML) approach that tries to preserve privacy by bringing the learning of an ML model to data owners'. However, literature shows different attack methods such as membership inference that exploit the vulnerabilities of ML models as well as the coordinating servers to retrieve private data. Hence, FedML needs additional measures to guarantee data privacy. Furthermore, big data often requires more resources than available in a standard computer. This paper addresses these issues by proposing a distributed perturbation algorithm named as DISTPAB, for privacy preservation of horizontally partitioned data. DISTPAB alleviates computational bottlenecks by distributing the task of privacy preservation utilizing the asymmetry of resources of a distributed environment, which can have resource-constrained devices as well as high-performance computers. Experiments show that DISTPAB provides high accuracy, high efficiency, high scalability, and high attack resistance. Further experiments on privacy-preserving FedML show that DISTPAB is an excellent solution to stop privacy leaks in DML while preserving high data utility.

[1]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Ching-I Teng,et al.  Voluntary sharing and mandatory provision: Private information disclosure on social networking sites , 2020, Inf. Process. Manag..

[3]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[4]  Mohsen Guizani,et al.  Secure Edge of Things for Smart Healthcare Surveillance Framework , 2019, IEEE Access.

[5]  Dongxi Liu,et al.  Local Differential Privacy for Deep Learning , 2019, IEEE Internet of Things Journal.

[6]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Claudio Bettini,et al.  Privacy protection in pervasive systems: State of the art and technical challenges , 2015, Pervasive Mob. Comput..

[8]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[9]  Mohammad Abdur Razzaque,et al.  A comprehensive review on privacy preserving data mining , 2015, SpringerPlus.

[10]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[11]  Alper Bilge,et al.  Privacy-preserving multi-criteria collaborative filtering , 2019, Inf. Process. Manag..

[12]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[13]  Guillermo Navarro-Arribas,et al.  User k-anonymity for privacy preserving data mining of query logs , 2012, Inf. Process. Manag..

[14]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[15]  D. Liu,et al.  Efficient Data Perturbation for Privacy Preserving and Accurate Data Stream Mining , 2018, Pervasive Mob. Comput..

[16]  Khaled Salah,et al.  IoT security: Review, blockchain solutions, and open challenges , 2017, Future Gener. Comput. Syst..

[17]  Josep Domingo-Ferrer,et al.  t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[18]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[19]  Ibrahim Khalil,et al.  An Efficient and Scalable Privacy Preserving Algorithm for Big Data and Data Streams , 2019, Comput. Secur..

[20]  Mete Akgün,et al.  Privacy preserving processing of genomic data: A survey , 2015, J. Biomed. Informatics.

[21]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[22]  Charu C. Aggarwal Privacy and the Dimensionality Curse , 2008, Privacy-Preserving Data Mining.

[23]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[24]  Huw Jones Computer graphics through key mathematics , 2001 .

[25]  Dong-Hee Shin,et al.  Information tailoring and framing in wearable health communication , 2017, Inf. Process. Manag..

[26]  M.A.P. Chamikara,et al.  Efficient privacy preservation of big data for accurate data mining , 2019, Inf. Sci..

[27]  M. Omair Ahmad,et al.  A novel normalization technique for multimodal biometric systems , 2015, 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS).

[28]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[29]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[30]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[31]  Huseyin Polat,et al.  A scalable privacy-preserving recommendation scheme via bisecting k-means clustering , 2013, Inf. Process. Manag..

[32]  Ray W. Grout,et al.  Numerically stable, single-pass, parallel statistics algorithms , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[33]  Alan W. Paeth Graphics Gems V: MacIntosh Versiion , 1995 .

[34]  James Alan Fox,et al.  Randomized Response and Related Methods: Surveying Sensitive Data , 2015 .

[35]  Claudio Carpineto,et al.  KΘ-affinity privacy: Releasing infrequent query refinements safely , 2015, Inf. Process. Manag..

[36]  Jared M. Maruskin Essential Linear Algebra , 2012 .

[37]  Domingo-FerrerJosep,et al.  t-Closeness through Microaggregation , 2015 .

[38]  Huseyin Polat,et al.  A survey: deriving private information from perturbed data , 2015, Artificial Intelligence Review.

[39]  Gunasekaran Manogaran,et al.  Big Data Knowledge System in Healthcare , 2017 .

[40]  Qinghua Li,et al.  Achieving k-anonymity in privacy-aware location-based services , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[41]  Jun Luo,et al.  An effective value swapping method for privacy preserving data publishing , 2016, Secur. Commun. Networks.

[42]  Daniel Kifer,et al.  Designing statistical privacy for your data , 2015, Commun. ACM.

[43]  Elisa Bertino,et al.  A Survey of Quantification of Privacy Preserving Data Mining Algorithms , 2008, Privacy-Preserving Data Mining.

[44]  Keke Chen,et al.  Under Consideration for Publication in Knowledge and Information Systems Geometric Data Perturbation for Privacy Preserving Outsourced Data Mining , 2010 .

[45]  Xiaolei Dong,et al.  Security and Privacy for Cloud-Based IoT: Challenges , 2017, IEEE Communications Magazine.

[46]  Francesc Sebé,et al.  Privacy preserving release of blogosphere data in the presence of search engines , 2013, Inf. Process. Manag..

[47]  Chris Hankin,et al.  Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification , 2019, Inf. Process. Manag..

[48]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[49]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[50]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[51]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[52]  Theo P. van der Weide,et al.  Enriching queries with user preferences in healthcare , 2014, Inf. Process. Manag..

[53]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[54]  Liwan H. Liyanage,et al.  Fuzzy based binary feature profiling for modus operandi analysis , 2016, PeerJ Comput. Sci..

[55]  Philip S. Yu,et al.  Can the Utility of Anonymized Data be Used for Privacy Breaches? , 2009, TKDD.

[56]  Ling Liu,et al.  A Random Rotation Perturbation Approach to Privacy Preserving Data Classification , 2005 .

[57]  P. Alam ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[58]  Ibrahim Khalil,et al.  Real-time Secure Health Surveillance for Smarter Health Communities , 2019, IEEE Communications Magazine.

[59]  Vladimir Oleshchuk,et al.  Internet of things and privacy preserving technologies , 2009, 2009 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology.

[60]  Dongxi Liu,et al.  A Trustworthy Privacy Preserving Framework for Machine Learning in Industrial IoT Systems , 2020, IEEE Transactions on Industrial Informatics.

[61]  Arturo S. Leon,et al.  Controlling HEC-RAS using MATLAB , 2016, Environ. Model. Softw..