Data Publishing: A Two-Stage Approach to Improving Algorithm Efficiency

While the strategy in previous chapter is theoretically superior to existing ones due to its independence of utility measures and privacy models, and its privacy guarantee under publicly-known algorithms, it incurs a high computational complexity. In this chapter, we study an efficient strategy for diversity preserving data publishing with publicly known algorithms (algorithms as side-channel). Our main observation is that a high computational complexity is usually incurred when an algorithm conflates the processes of privacy preservation and utility optimization. We then propose a novel privacy streamliner approach to decouple those two processes for improving algorithm efficiency. More specifically, we first identify a set of potential privacy-preserving solutions satisfying that an adversary’s knowledge about this set itself will not help him/her to violate the privacy property; we can then optimize utility within this set without worrying about privacy breaches since such an optimization is now simulatable by adversaries. To make our approach more concrete, we study it in the context of micro-data release with publicly known generalization algorithms. The analysis and experiments both confirm our algorithms to be more efficient than existing solutions.

[1]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[2]  Ninghui Li,et al.  Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy , 2011, ArXiv.

[3]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[5]  Lingyu Wang,et al.  k-jump strategy for preserving privacy in micro-data disclosure , 2010, ICDT '10.

[6]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[7]  Raymond Chi-Wing Wong,et al.  Privacy-Preserving Data Publishing: An Overview , 2010, Privacy-Preserving Data Publishing: An Overview.

[8]  Lingyu Wang,et al.  Privacy streamliner: a two-stage approach to improving algorithm efficiency , 2012, CODASPY '12.

[9]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[10]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[11]  Sushil Jajodia,et al.  L-Cover: Preserving Diversity by Anonymity , 2009, Secure Data Management.

[12]  Philip W. L. Fong,et al.  A Privacy Preservation Model for Facebook-Style Social Network Systems , 2009, ESORICS.

[13]  Walid G. Aref,et al.  Supporting views in data stream management systems , 2010, TODS.

[14]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[15]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[17]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Yufei Tao,et al.  Transparent anonymization: Thwarting adversaries who know the algorithm , 2010, TODS.

[20]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  Nina Mishra,et al.  Simulatable auditing , 2005, PODS.