Sensitive Data Anonymization Using Genetic Algorithms for SOM-based Clustering

Improving privacy protection by using smart methods has become a major focus in current research. However, despite all the technological compensations through analyzing privacy concerns, the literature does not yet provide evidence of frameworks and methods that enable privacy protection from multiple perspectives and take into account the privacy of sensitive data with regard to accuracy and efficiency of the general processes in the system. In our work, we focus on sensitive data protection based on the idea of a Self-Organizing Map (SOM) and try to anonymize sensitive data with Genetic Algorithms (GAs) techniques in order to improve privacy without significantly deteriorating the accuracy and efficiency of the overall process. We organize the dataset in subspaces according to their information theoretical distance to each other in distributed local servers and then generalize attribute values to the minimum extent required so that both the data disclosure probability and the information loss are kept to a negligible minimum. Our analysis shows that our protocol offers clustering without greatly exposing individual privacy and causes only negligible superfluous costs and information loss because of privacy requirements.