Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge

Privacy is an important issue in data publishing. Many organizations distribute non-aggregate personal data for research, and they must take steps to ensure that an adversary cannot predict sensitive information pertaining to individuals with high confidence. This problem is further complicated by the fact that, in addition to the published data, the adversary may also have access to other resources (e.g., public records and social networks relating individuals), which we call external knowledge. A robust privacy criterion should take this external knowledge into consideration. In this paper, we first describe a general framework for reasoning about privacy in the presence of external knowledge. Within this framework, we propose a novel multidimensional approach to quantifying an adversary's external knowledge. This approach allows the publishing organization to investigate privacy threats and enforce privacy requirements in the presence of various types and amounts of external knowledge. Our main technical contributions include a multidimensional privacy criterion that is more intuitive and flexible than previous approaches to modeling background knowledge. In addition, we provide algorithms for measuring disclosure and sanitizing data that improve computational efficiency several orders of magnitude over the best known techniques.

[1]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[2]  Joseph Y. Halpern,et al.  From Statistical Knowledge Bases to Degrees of Belief , 1996, Artif. Intell..

[3]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[5]  Alin Deutsch,et al.  Privacy in Database Publishing , 2005, ICDT.

[6]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[7]  Sushil Jajodia,et al.  Checking for k-Anonymity Violation by Views , 2005, VLDB.

[8]  Dan Suciu,et al.  Asymptotic Conditional Probabilities for Conjunctive Queries , 2005, ICDT.

[9]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[11]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[12]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge in Privacy , 2006 .

[14]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[15]  Ashwin Machanavajjhala,et al.  On the efficiency of checking perfect privacy , 2006, PODS '06.

[16]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[17]  David J. DeWitt,et al.  Scalable Anonymization Algorithms for Large Data Sets , 2007 .

[18]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.