Prior-itizing Privacy: A Bayesian Approach to Setting the Privacy Budget in Differential Privacy

When releasing outputs from confidential data, agencies need to balance the analytical usefulness of the released data with the obligation to protect data subjects' confidentiality. For releases satisfying differential privacy, this balance is reflected by the parameter epsilon, known as the privacy budget. In practice, it can be difficult for agencies to select and interpret epsilon. We use Bayesian posterior probabilities of disclosure to provide a framework for setting epsilon. The agency decides how much posterior risk it is willing to accept in a data release at various levels of prior risk. Using a mathematical relationship among these probabilities and epsilon, the agency selects the maximum epsilon that ensures the posterior-to-prior ratios are acceptable for all values of prior disclosure risk. The framework applies to any differentially private mechanism.

[1]  S. Vadhan,et al.  Don’t Look at the Data! How Differential Privacy Reconfigures the Practices of Data Science , 2023, CHI.

[2]  John M. Abowd,et al.  Bayesian and Frequentist Semantics for Common Variations of Differential Privacy: Applications to the 2020 Census , 2022, ArXiv.

[3]  Aaron J. Sojourner,et al.  Balancing data privacy and usability in the federal statistical system , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jerome P. Reiter,et al.  Assessing Statistical Disclosure Risk for Differentially Private, Hierarchical Count Data, With Application to the 2020 U. S. Decennial Census , 2022, Statistica Sinica.

[5]  John M. Abowd,et al.  Geographic Spines in the 2020 Census Disclosure Avoidance System TopDown Algorithm , 2022, Special Issue 2: Differential Privacy for the 2020 U.S. Census.

[6]  Michael Hewett,et al.  Differential privacy for public health data: An innovative tool to optimize information sharing while protecting data confidentiality , 2021, Patterns.

[7]  Joerg Drechsler Differential Privacy for Government Agencies—Are We There Yet? , 2021, Journal of the American Statistical Association.

[8]  Xiao-Li Meng,et al.  Congenial Differential Privacy under Mandated Disclosure , 2020, FODS.

[9]  Jerome P. Reiter Differential Privacy and Federal Data Releases , 2019, Annual Review of Statistics and Its Application.

[10]  Charles F. Manski,et al.  The lure of incredible certitude , 2018, Economics and Philosophy.

[11]  Borja Balle,et al.  Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences , 2018, NeurIPS.

[12]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[13]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[14]  Andreas Haeberlen,et al.  Differential Privacy: An Economic Method for Choosing Epsilon , 2014, 2014 IEEE 27th Computer Security Foundations Symposium.

[15]  Jerome P. Reiter,et al.  Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data , 2012, Trans. Data Priv..

[16]  Ilya Mironov,et al.  On significance of the least significant bits for differential privacy , 2012, CCS.

[17]  Dan Suciu,et al.  A theory of pricing private data , 2012, ICDT '13.

[18]  Yu-Han Lyu,et al.  Approximately optimal auctions for selling privacy when costs are correlated with data , 2012, EC '12.

[19]  Stratis Ioannidis,et al.  Privacy Auctions for Recommender Systems , 2011, TEAC.

[20]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[21]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[22]  C. Dwork A firm foundation for private data analysis , 2011, Commun. ACM.

[23]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[24]  Lars Vilhuber,et al.  How Protective Are Synthetic Data? , 2008, Privacy in Statistical Databases.

[25]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[27]  Jerome P. Reiter Estimating Risks of Identification Disclosure in Microdata , 2005 .

[28]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[29]  Alexandra Wood Designing Access with Differential Privacy , 2021 .

[30]  S. Vadhan,et al.  A Programming Framework for OpenDP∗† , 2020 .

[31]  Thomas Steinke,et al.  Differential Privacy: A Primer for a Non-Technical Audience , 2018 .

[32]  S. Fienberg,et al.  A Bayesian Approach to Data Disclosure: Optimal Intruder Behavior for Continuous Data , 1997 .