Evaluating Efficiency and Effectiveness of Federated Learning Approaches in Knowledge Extraction Tasks

Federated Learning is a valuable instrument for building AI-based systems that preserve the privacy and security of sensitive data, based on the main concept of shifting no more the data to the edges but moving computations to data, avoiding the collection, sharing, and use of such data by third parties. More robust federated learning systems should be able of preventing malicious inference over both data exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. This study proposes a preliminary analysis to investigate and evaluate the effectiveness and efficiency of a federated approach to ensure valid classification accuracy and data security. A real case study from the ANDROIDS project, concerning the application of machine learning-based systems for supporting mental-health disorders detection, was considered. Large amounts of sensitive patient information are collected, which must be obfuscated or anonymized to provide a preliminary level of protection. Unfortunately, the real bottleneck lies in the difficulty of extracting all sensitive data for anonymization, due to a lot of data to handle as well as the considerable effort required. We propose a Natural Language Processing approach for sensitive knowledge detection and classification, performed by adopting a federated approach. Accuracy decay and latency introduced by applying a decentralized learning approach compared to the same task and data performed in a centralized way were evaluated. Preliminary results proved that effectiveness can be reached by a correct tuning of the federated algorithm and by choosing the right number of participants to the federation.