Anonymizing Data with Relational and Transaction Attributes

Publishing datasets about individuals that contain both relational and transaction (i.e., set-valued) attributes is essential to support many applications, ranging from healthcare to marketing. However, preserving the privacy and utility of these datasets is challenging, as it requires (i) guarding against attackers, whose knowledge spans both attribute types, and (ii) minimizing the overall information loss. Existing anonymization techniques are not applicable to such datasets, and the problem cannot be tackled based on popular, multi-objective optimization strategies. This work proposes the first approach to address this problem. Based on this approach, we develop two frameworks to offer privacy, with bounded information loss in one attribute type and minimal information loss in the other. To realize each framework, we propose privacy algorithms that effectively preserve data utility, as verified by extensive experiments.

[1]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[2]  Grigorios Loukides,et al.  Clustering-Based K-Anonymisation Algorithms , 2007, DEXA.

[3]  Chedy Raïssi,et al.  Anonymizing set-valued data by nonreciprocal recoding , 2012, KDD.

[4]  Nikos Mamoulis,et al.  Privacy Preservation by Disassociation , 2012, Proc. VLDB Endow..

[5]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[6]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[7]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Aris Gkoulalas-Divanis,et al.  Utility-guided Clustering-based Transaction Data Anonymization , 2012, Trans. Data Priv..

[9]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[10]  Panos Kalnis,et al.  A framework for efficient data anonymization under privacy and accuracy constraints , 2009, TODS.

[11]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[12]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[13]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[14]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[15]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[16]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[17]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[19]  Alex Alves Freitas,et al.  A critical review of multi-objective optimization in data mining: a position paper , 2004, SKDD.

[20]  Bradley Malin,et al.  COAT: COnstraint-based anonymization of transactions , 2010, Knowledge and Information Systems.

[21]  KarrasPanagiotis,et al.  A framework for efficient data anonymization under privacy and accuracy constraints , 2009 .

[22]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[24]  Mukesh K. Mohania,et al.  Advances in Databases: Concepts, Systems and Applications , 2007 .