Differentially Private Learning of Undirected Graphical Models Using Collective Graphical Models

We investigate the problem of learning discrete, undirected graphical models in a differentially private way. We show that the approach of releasing noisy sufficient statistics using the Laplace mechanism achieves a good trade-off between privacy, utility, and practicality. A naive learning algorithm that uses the noisy sufficient statistics "as is" outperforms general-purpose differentially private learning algorithms. However, it has three limitations: it ignores knowledge about the data generating process, rests on uncertain theoretical foundations, and exhibits certain pathologies. We develop a more principled approach that applies the formalism of collective graphical models to perform inference over the true sufficient statistics within an expectation-maximization framework. We show that this learns better models than competing approaches on both synthetic data and on real human mobility data used as a case study.

[1]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[2]  Christos Dimitrakakis,et al.  Robust and Private Bayesian Inference , 2013, ALT.

[3]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[4]  Aleksandra B. Slavkovic,et al.  Differentially Private Exponential Random Graphs , 2014, Privacy in Statistical Databases.

[5]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[6]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[7]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[8]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.

[9]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[10]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[11]  Prateek Jain,et al.  Differentially Private Learning with Kernels , 2013, ICML.

[12]  Vishesh Karwa,et al.  Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs , 2012, 1205.4697.

[13]  Thomas G. Dietterich,et al.  Collective Graphical Models , 2011, NIPS.

[14]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[15]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[16]  Stephen T. Joy The Differential Privacy of Bayesian Inference , 2015 .

[17]  Tao Sun,et al.  Message Passing for Collective Graphical Models , 2015, ICML.

[18]  Jeffrey F. Naughton,et al.  Differentially Private Stochastic Gradient Descent for in-RDBMS Analytics , 2016, ArXiv.

[19]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[20]  Andrew McCallum,et al.  Bethe Projections for Non-Local Inference , 2015, UAI.

[21]  Divesh Srivastava,et al.  DPT: Differentially Private Trajectory Synthesis Using Hierarchical Reference Systems , 2015, Proc. VLDB Endow..

[22]  Thomas G. Dietterich,et al.  Gaussian Approximation of Collective Graphical Models , 2014, ICML.

[23]  Stephen E. Fienberg,et al.  Maximum likelihood estimation in log-linear models , 2011, 1104.3618.

[24]  James R. Foulds,et al.  On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis , 2016, UAI.

[25]  Shelby J. Haberman,et al.  Log-Linear Models for Frequency Data: Sufficient Statistics and Likelihood Equations , 1973 .

[26]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[27]  Thomas G. Dietterich,et al.  Approximate Inference in Collective Graphical Models , 2013, ICML.

[28]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[29]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[30]  Hoong Chuin Lau,et al.  Approximate Inference Using DC Programming For Collective Graphical Models , 2016, AISTATS.

[31]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[32]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[33]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[34]  Jun Zhang,et al.  PrivBayes: private data release via bayesian networks , 2014, SIGMOD Conference.

[35]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[36]  Stephen E. Fienberg,et al.  Differential Privacy for Protecting Multi-dimensional Contingency Table Data: Extensions and Applications , 2012, J. Priv. Confidentiality.

[37]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[38]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..