Differential Privacy for the Vast Majority

Differential privacy has become one of the widely used mechanisms for protecting sensitive information in databases and information systems. Although differential privacy provides a clear measure of privacy guarantee, it implicitly assumes that each individual corresponds to a single record in the result of a database query. This assumption may not hold in many database query applications. When an individual has multiple records, strict implementation of differential privacy may cause significant information loss. In this study, we extend the differential privacy principle to situations where multiple records in a database are associated with the same individual. We propose a new privacy principle that integrates differential privacy with the Pareto principle in analyzing privacy risk and data utility. When applied to the situations with multiple records per person, the proposed approach can significantly reduce the information loss in the released query results with a relatively small relaxation in the differential privacy guarantee. The effectiveness of the proposed approach is evaluated using three real-world databases.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[3]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[4]  GangopadhyayAryya,et al.  A privacy protection technique for publishing data mining models and research data , 2010 .

[5]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Khaled El Emam,et al.  The application of differential privacy to health data , 2012, EDBT-ICDT '12.

[7]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[8]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[9]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[10]  Costantino Bresciani-Turroni On Pareto's law , 1937 .

[11]  Ray Strong,et al.  Discovering Discontinuity in Big Financial Transaction Data , 2018, ACM Trans. Manag. Inf. Syst..

[12]  Khaled El Emam,et al.  Evaluating the risk of patient re-identification from adverse drug event reports , 2013, BMC Medical Informatics and Decision Making.

[13]  Yu Fu,et al.  A privacy protection technique for publishing data mining models and research data , 2010, TMIS.

[14]  Dezon Finch,et al.  A Case Study of Data Quality in Text Mining Clinical Progress Notes , 2015, TMIS.