A survey on federated learning in data mining

Data mining is a process to extract unknown, hidden, and potentially useful information from data. But the problem of data island makes it arduous for people to collect and analyze scattered data, and there is also a privacy security issue when mining data. A collaboratively decentralized approach called federated learning unites multiple participants to generate a shareable global optimal model and keeps privacy‐sensitive data on local devices, which may bring great hope to us for solving the problems of decentralized data and privacy protection. Though federated learning has been widely used, few systematic studies have been conducted on the subject of federated learning in data mining. Hence, different from prior reviews in this field, we make a comprehensive summary and provide a novel taxonomy of the application of federated learning in data mining. This article starts by providing a thorough description of the relevant definitions and concepts, followed by an in‐depth investigation on the challenges faced by federated learning. In this context, we elaborate four taxonomies of major applications of federated learning in data mining, including education, healthcare, IoT, and intelligent transportation, and discuss them comprehensively. Finally, we discuss four promising research directions for further research, that is, privacy enhancement, improvement of communication efficiency, heterogeneous system processing, and reducing economic costs.

[1]  Bing Chen,et al.  Poisoning Attack in Federated Learning using Generative Adversarial Nets , 2019, 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[2]  Ahmad F. Klaib,et al.  Intelligent Transportation and Control Systems Using Data Mining and Machine Learning Techniques: A Comprehensive Study , 2019, IEEE Access.

[3]  Multi-Center Federated Learning , 2020, ArXiv.

[4]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[5]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[6]  Huafei Zhu,et al.  Privacy-preserving Weighted Federated Learning within Oracle-Aided MPC Framework , 2020, ArXiv.

[7]  Keqiu Li,et al.  Applications of federated learning in smart cities: recent advances, taxonomy, and open challenges , 2021, Connect. Sci..

[8]  Mehdi Bennis,et al.  On-Device Federated Learning via Blockchain and its Latency Analysis , 2018, ArXiv.

[9]  Onur Mutlu,et al.  Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.

[10]  Danda B. Rawat,et al.  Privacy Preserving Misbehavior Detection in IoV Using Federated Machine Learning , 2021, 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC).

[11]  Chunyan Miao,et al.  Towards Federated Learning in UAV-Enabled Internet of Vehicles: A Multi-Dimensional Contract-Matching Approach , 2020, IEEE Transactions on Intelligent Transportation Systems.

[12]  Kuo-Yi Lin,et al.  A Survey on federated learning* , 2020, 2020 IEEE 16th International Conference on Control & Automation (ICCA).

[13]  Danda B. Rawat,et al.  Towards Federated Learning Approach to Determine Data Relevance in Big Data , 2019, 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI).

[14]  Nigam H Shah,et al.  Ethics of Using and Sharing Clinical Imaging Data for Artificial Intelligence: A Proposed Framework. , 2020, Radiology.

[15]  Reza M. Parizi,et al.  Federated Learning: A Survey on Enabling Technologies, Protocols, and Applications , 2020, IEEE Access.

[16]  Rui Zhang,et al.  A Hybrid Approach to Privacy-Preserving Federated Learning , 2019, AISec@CCS.

[17]  Duo Liu,et al.  FedGroup: Ternary Cosine Similarity-based Clustered Federated Learning Framework toward High Accuracy in Heterogeneous Data , 2020, ArXiv.

[18]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[19]  Priyanka Mary Mammen,et al.  Federated Learning: Opportunities and Challenges , 2021, ArXiv.

[20]  Wei Wang,et al.  CMFL: Mitigating Communication Overhead for Federated Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[21]  Song Guo,et al.  Pedagogical Data Federation toward Education 4.0 , 2020, ICFET.

[22]  Aruna Seneviratne,et al.  Federated Learning for Internet of Things: A Comprehensive Survey , 2021, IEEE Communications Surveys & Tutorials.

[23]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[24]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[26]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[27]  Amit Ganatra,et al.  A Survey: Privacy Preservation Techniques in Data Mining , 2015 .

[28]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[29]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[30]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[31]  Tianjian Chen,et al.  A Fairness-aware Incentive Scheme for Federated Learning , 2020, AIES.

[32]  Giancarlo Fortino,et al.  Data Mining at the IoT Edge , 2019, 2019 28th International Conference on Computer Communication and Networks (ICCCN).

[33]  Aris Gkoulalas-Divanis,et al.  Differential Privacy-enabled Federated Learning for Sensitive Health Data , 2019, ArXiv.

[34]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[35]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[36]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.

[37]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[38]  Fei Wang,et al.  Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis , 2018, JMIR medical informatics.

[39]  Gopal K Gupta,et al.  Introduction to Data Mining with Case Studies , 2011 .

[40]  Jiawen Kang,et al.  Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach , 2020, IEEE Internet of Things Journal.

[41]  Sarvar Patel,et al.  Practical Secure Aggregation for Federated Learning on User-Held Data , 2016, ArXiv.

[42]  Walid Saad,et al.  Federated Echo State Learning for Minimizing Breaks in Presence in Wireless Virtual Reality Networks , 2018, IEEE Transactions on Wireless Communications.

[43]  Mehmet Emre Gursoy,et al.  Data Poisoning Attacks Against Federated Learning Systems , 2020, ESORICS.

[44]  Weishan Zhang,et al.  Dynamic-Fusion-Based Federated Learning for COVID-19 Detection , 2020, IEEE Internet of Things Journal.

[45]  Sabrina De Capitani di Vimercati,et al.  k -Anonymous Data Mining: A Survey , 2008, Privacy-Preserving Data Mining.

[46]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[47]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[48]  Hamed Haddadi,et al.  Efficient and Private Federated Learning using TEE , 2019 .

[49]  Ali Dehghantanha,et al.  A survey on security and privacy of federated learning , 2021, Future Gener. Comput. Syst..

[50]  Zou Deqing,et al.  Research on Privacy Preservation Mechanism for Credentials and Policies in Grid Computing Environment , 2007 .

[51]  Bo Sun,et al.  Resource allocation and scheduling in the intelligent edge computing context , 2021, Future Gener. Comput. Syst..

[52]  Chang Hui Research on Privacy-Preserving Collaborative Filtering Recommendation Based on Distributed Data , 2006 .

[53]  Lingjuan Lyu,et al.  Threats to Federated Learning , 2020, Federated Learning.

[54]  Ghassan Hamarneh,et al.  Deep learning for biomedical image reconstruction: a survey , 2020, Artificial Intelligence Review.

[55]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[56]  Han Yu,et al.  Threats to Federated Learning: A Survey , 2020, ArXiv.

[57]  Dimitris Stripelis,et al.  Semi-Synchronous Federated Learning , 2021, ArXiv.

[58]  Xu Chen,et al.  In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning , 2018, IEEE Network.

[59]  Zhang Peng,et al.  An Effective Method for Privacy Preserving Association Rule Mining , 2006 .

[60]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[61]  Vijayan K. Asari,et al.  The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches , 2018, ArXiv.

[62]  Xianfeng Tang,et al.  Modeling Spatial-Temporal Dynamics for Traffic Prediction , 2018, ArXiv.

[63]  Xuanzhe Liu,et al.  Hierarchical Federated Learning through LAN-WAN Orchestration , 2020, ArXiv.

[64]  Yuan Gao,et al.  A survey on federated learning , 2021, Knowl. Based Syst..

[65]  Chen Guo-liang An Algorithm for Privacy-preserving Boolean Association Rule Mining , 2005 .

[66]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[67]  A. Joy Christy,et al.  Applications of Educational Data Mining: A survey , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[68]  Takayuki Nishio,et al.  Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[69]  Shi Baile,et al.  Privacy Preserving Classification Mining , 2006 .

[70]  Kipp W. Johnson,et al.  The next generation of precision medicine: observational studies, electronic health records, biobanks and continuous monitoring. , 2018, Human molecular genetics.

[71]  Ananda Theertha Suresh,et al.  Can You Really Backdoor Federated Learning? , 2019, ArXiv.

[72]  Qiang Wang,et al.  Data Poisoning Attacks on Federated Machine Learning , 2020, IEEE Internet of Things Journal.

[73]  Haithum Elhadi,et al.  Federated Uncertainty-Aware Learning for Distributed Hospital EHR Data , 2019, ArXiv.

[74]  Yang Song,et al.  Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[75]  Mohsen Guizani,et al.  A Survey on Federated Learning: The Journey From Centralized to Distributed On-Site Learning and Beyond , 2021, IEEE Internet of Things Journal.

[76]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[77]  Reza Rawassizadeh,et al.  FEDZIP: A Compression Framework for Communication-Efficient Federated Learning , 2021, ArXiv.

[78]  Yang Liu,et al.  A Sustainable Incentive Scheme for Federated Learning , 2020, IEEE Intelligent Systems.

[79]  Dongfang Ma,et al.  Determining the Breakpoints of Fundamental Diagrams , 2020, IEEE Intelligent Transportation Systems Magazine.

[80]  H. Vincent Poor,et al.  On Safeguarding Privacy and Security in the Framework of Federated Learning , 2020, IEEE Network.

[81]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[82]  Yassine Laguel,et al.  Device Heterogeneity in Federated Learning: A Superquantile Approach , 2020, ArXiv.

[83]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[84]  M. Shamim Hossain,et al.  Privacy-preserving blockchain-based federated learning for traffic flow prediction , 2021, Future Gener. Comput. Syst..

[85]  Qiang Yang,et al.  Federated Deep Reinforcement Learning , 2019, 1901.08277.

[86]  Xinjun Qi,et al.  An Overview of Privacy Preserving Data Mining , 2012 .

[87]  Jianfeng Zhan,et al.  FLBench: A Benchmark Suite for Federated Learning , 2020, Communications in Computer and Information Science.

[88]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[89]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[90]  Kai Li,et al.  Privacy-preserving Learning via Deep Net Pruning , 2020, ArXiv.

[91]  Michael Moeller,et al.  Inverting Gradients - How easy is it to break privacy in federated learning? , 2020, NeurIPS.

[92]  Amir Masoud Rahmani,et al.  Systematic survey of big data and data mining in internet of things , 2018, Comput. Networks.

[93]  Riccardo Miotto,et al.  Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19 , 2020, medRxiv.

[94]  Xiaosong Zhang,et al.  Blockchain-Enabled Federated Learning Data Protection Aggregation Scheme With Differential Privacy and Homomorphic Encryption in IIoT , 2021, IEEE Transactions on Industrial Informatics.

[95]  Anmin Fu,et al.  VFL: A Verifiable Federated Learning With Privacy-Preserving for Big Data in Industrial IoT , 2020, IEEE Transactions on Industrial Informatics.

[96]  Ruslan Salakhutdinov,et al.  Think Locally, Act Globally: Federated Learning with Local and Global Representations , 2020, ArXiv.

[97]  Soo-Yong Shin,et al.  Federated Learning on Clinical Benchmark Data: Performance Assessment , 2020, Journal of medical Internet research.

[98]  Mihika Shah,et al.  A Survey of Data Mining Clustering Algorithms , 2015 .

[99]  Deze Zeng,et al.  A Learning-Based Incentive Mechanism for Federated Learning , 2020, IEEE Internet of Things Journal.

[100]  Úlfar Erlingsson,et al.  The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[101]  Walid Saad,et al.  Distributed Federated Learning for Ultra-Reliable Low-Latency Vehicular Communications , 2018, IEEE Transactions on Communications.

[102]  Ming Liu,et al.  Federated Transfer Reinforcement Learning for Autonomous Driving , 2019, ArXiv.

[103]  Divya Tomar,et al.  A survey on Data Mining approaches for Healthcare , 2013, BSBT 2013.

[104]  Andrew M. Dai,et al.  Federated and Differentially Private Learning for Electronic Health Records , 2019, ArXiv.

[105]  Ahmet Ali Süzen,et al.  A Novel Approach to Machine Learning Application to Protection Privacy Data in Healthcare: Federated Learning , 2020 .

[106]  Spyridon Bakas,et al.  Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data , 2020, Scientific Reports.

[107]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[108]  Qiong Wu,et al.  Personalized Federated Learning for Intelligent IoT Applications: A Cloud-Edge Based Framework , 2020, IEEE Open Journal of the Computer Society.

[109]  Yan Zhang,et al.  Differentially Private Asynchronous Federated Learning for Mobile Edge Computing in Urban Informatics , 2020, IEEE Transactions on Industrial Informatics.

[110]  Alfredo Cuzzocrea,et al.  Predictive analytics on open big data for supporting smart transportation services , 2020, Procedia Computer Science.

[111]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[112]  Ahmet M. Elbir,et al.  Federated Learning for Vehicular Networks , 2020, ArXiv.

[113]  Tarik Taleb,et al.  Federated Machine Learning: Survey, Multi-Level Classification, Desirable Criteria and Future Directions in Communication and Networking Systems , 2021, IEEE Communications Surveys & Tutorials.

[114]  Leandros Tassiulas,et al.  Model Pruning Enables Efficient Federated Learning on Edge Devices , 2019, ArXiv.

[115]  Vincent K. N. Lau,et al.  Analog Gradient Aggregation for Federated Learning Over Wireless Networks: Customized Design and Convergence Analysis , 2021, IEEE Internet of Things Journal.

[116]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[117]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[118]  Li Li,et al.  A review of applications in federated learning , 2020, Comput. Ind. Eng..

[119]  Yan Zhang,et al.  Blockchain and Federated Learning for Privacy-Preserved Data Sharing in Industrial IoT , 2020, IEEE Transactions on Industrial Informatics.

[120]  Ira S. Moskowitz,et al.  Parsimonious downgrading and decision trees applied to the inference problem , 1998, NSPW '98.

[121]  Walid Saad,et al.  Federated Learning for Edge Networks: Resource Optimization and Incentive Mechanism , 2019, IEEE Communications Magazine.

[122]  Naixue Xiong,et al.  Accelerating Federated Learning for IoT in Big Data Analytics With Pruning, Quantization and Selective Updating , 2021, IEEE Access.

[123]  Fei Chen,et al.  Federated Meta-Learning with Fast Convergence and Efficient Communication , 2018 .

[124]  Enhong Chen,et al.  Federated Deep Knowledge Tracing , 2021, WSDM.