The Right to Be Forgotten: Towards Machine Learning on Perturbed Knowledge Bases

Today’s increasingly complex information infrastructures represent the basis of any data-driven industries which are rapidly becoming the 21st century’s economic backbone. The sensitivity of those infrastructures to disturbances in their knowledge bases is therefore of crucial interest for companies, organizations, customers and regulating bodies. This holds true with respect to the direct provisioning of such information in crucial applications like clinical settings or the energy industry, but also when considering additional insights, predictions and personalized services that are enabled by the automatic processing of those data. In the light of new EU Data Protection regulations applying from 2018 onwards which give customers the right to have their data deleted on request, information processing bodies will have to react to these changing jurisdictional (and therefore economic) conditions. Their choices include a re-design of their data infrastructure as well as preventive actions like anonymization of databases per default. Therefore, insights into the effects of perturbed/anonymized knowledge bases on the quality of machine learning results are a crucial basis for successfully facing those future challenges. In this paper we introduce a series of experiments we conducted on applying four different classifiers to an established dataset, as well as several distorted versions of it and present our initial results.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[4]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[5]  Alina Campan,et al.  Data and Structural k-Anonymity in Social Networks , 2009, PinKDD.

[6]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Bruce M. Kapron,et al.  Social Network Anonymization via Edge Addition , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[8]  Shaogang Gong,et al.  Reidentification by Relative Distance Comparison , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Camelia-Mihaela Pintea,et al.  Towards interactive Machine Learning (iML): Applying Ant Colony Algorithms to Solve the Traveling Salesman Problem with the Human-in-the-Loop Approach , 2016, CD-ARES.

[10]  Sean Chester,et al.  k-Anonymization of Social Networks by Vertex Addition , 2011, ADBIS.

[11]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[12]  Chris Clifton,et al.  δ-Presence without Complete World Knowledge , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[14]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[15]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[16]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[17]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Andreas Holzinger,et al.  Machine Learning for Health Informatics , 2016, Lecture Notes in Computer Science.

[19]  Edgar R. Weippl,et al.  A tamper-proof audit and control system for the doctor in the loop , 2016, Brain Informatics.