DPCube: Releasing Differentially Private Data Cubes for Health Information

We demonstrate DPCube, a component in our Health Information DE-identification (HIDE) framework, for releasing differentially private data cubes (or multi-dimensional histograms) for sensitive data. HIDE is a framework we developed for integrating heterogenous structured and unstructured health information and provides methods for privacy preserving data publishing. The DPCube component uses differentially private access mechanisms and an innovative 2-phase multidimensional partitioning strategy to publish a multi-dimensional data cube or histogram that achieves good utility while satisfying differential privacy. We demonstrate that the released data cubes can serve as a sanitized synopsis of the raw database and, together with an optional synthesized dataset based on the data cubes, can support various Online Analytical Processing (OLAP) queries and learning tasks.

[1]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[2]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[3]  Dan Suciu,et al.  Boosting the Accuracy of Differentially-Private Queries Through Consistency , 2009, ArXiv.

[4]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[5]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[6]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[7]  Herbert S. Lin,et al.  Computational Technology for Effective Health Care: Immediate Steps and Strategic Directions , 2009 .

[8]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[9]  James J. Lu,et al.  HIDE: heterogeneous information DE-identification , 2009, EDBT '09.

[10]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[11]  Elisa Bertino,et al.  Private record matching using differential privacy , 2010, EDBT '10.

[12]  Joel H. Saltz,et al.  An evaluation of feature sets and sampling techniques for de-identification of medical records , 2010, IHI.

[13]  Li Xiong,et al.  An integrated framework for de-identifying unstructured medical data , 2009, Data Knowl. Eng..

[14]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[15]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.

[16]  Laura A. Levit,et al.  Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington, DC: National Academies Press , 2009 .

[17]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.