Partial Domain Theories for Privacy

Generalization and Suppression are two of the most used techniques to achieve k-anonymity. However, the generalization concept is also used in machine learning to obtain domain models useful for the classification task, and the suppression is the way to achieve such generalization. In this paper we want to address the anonymization of data preserving the classification task. What we propose is to use machine learning methods to obtain partial domain theories formed by partial descriptions of classes. Differently than in machine learning, we impose that such descriptions be as specific as possible, i.e., formed by the maximum number of attributes. This is achieved by suppressing some values of some records. In our method, we suppress only a particular value of an attribute in only a subset of records, that is, we use local suppression. This avoids one of the problems of global suppression that is the loss of more information than necessary.

[1]  Eva Armengol Building Partial Domain Theories from Explanations , 2008, Künstliche Intell..

[2]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[5]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[6]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[7]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[9]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Slava Kisilevich,et al.  A GIS-based decision support system for hotel room rate estimation and temporal price prediction: The hotel brokers' context , 2013, Decis. Support Syst..

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[13]  Ran Wolff,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Providing k-Anonymity in Data Mining , 2022 .

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[16]  Eva Armengol,et al.  Relational Case-based Reasoning for Carcinogenic Activity Prediction , 2003, Artificial Intelligence Review.

[17]  Eva Armengol,et al.  Lazy Induction of Descriptions for Relational Case-Based Learning , 2001, ECML.