Query Processing with K-Anonymity

Anonymization techniques are used to ensure the privacy preservation of the data owners, especially for personal and sensitive data. While in most cases, data reside inside the database management system; most of the proposed anonymization techniques operate on and anonymize isolated datasets stored outside the DBMS. Hence, most of the desired functionalities of the DBMS are lost, e.g., consistency, recoverability, and efficient querying. In this paper, we address the challenges involved in enforcing the data privacy inside the DBMS. We implement the k-anonymity algorithm as a relational operator that interacts with other query operators to apply the privacy requirements while querying the data. We study anonymizing a single table, multiple tables, and complex queries that involve multiple predicates. We propose several algorithms to implement the anonymization operator that allow efficient non-blocking and pipelined execution of the query plan. We introduce the concept of k-anonymity view as an abstraction to treat k-anonymity (possibly, with multiple k preferences) as a relational view over the base table(s). For non-static datasets, we introduce the materialized k-anonymity views to ensure preserving the privacy under incremental updates. A prototype system is realized based on PostgreSQL with extended SQL and new relational operators to support anonymity views. The prototype system demonstrates how anonymity views integrate with other privacy-preserving components, e.g., limited retention, limited disclosure, and privacy policy management. Our experiments, on both synthetic and real datasets, illustrate the performance gain from the anonymity views as well as the proposed query optimization techniques under various scenarios.

[1]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[2]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[3]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[4]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Rakesh Agrawal,et al.  Extending relational database systems to automatically enforce privacy policies , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Marc Langheinrich,et al.  The platform for privacy preferences 1.0 (p3p1.0) specification , 2002 .

[7]  David J. DeWitt,et al.  Limiting Disclosure in Hippocratic Databases , 2004, VLDB.

[8]  Christos Faloutsos,et al.  Auditing Compliance with a Hippocratic Database , 2004, VLDB.

[9]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Walid G. Aref,et al.  Realizing Privacy-Preserving Features in Hippocratic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[11]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[13]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Ramakrishnan Srikant,et al.  Hippocratic Databases , 2002, VLDB.

[15]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[16]  Jian Pei,et al.  Maintaining K-Anonymity against Incremental Updates , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[17]  Lorrie Faith Cranor,et al.  The platform for privacy preferences , 1999, CACM.

[18]  Alina Campan,et al.  K-anonymization incremental maintenance and optimization techniques , 2007, SAC '07.

[19]  Jorge Lobo,et al.  On the Correctness Criteria of Fine-Grained Access Control in Relational Databases , 2007, VLDB.

[20]  Rakesh Agrawal,et al.  Managing healthcare data hippocratically , 2004, ACM SIGMOD Conference.

[21]  S. Sudarshan,et al.  Fine Grained Authorization Through Predicated Grants , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Walid G. Aref,et al.  Hippocratic PostgreSQL , 2009, 2009 IEEE 25th International Conference on Data Engineering.