Locally Private Bayesian Inference for Count Models

As more aspects of social interaction are digitally recorded, there is a growing need to develop privacy-preserving data analysis methods. Social scientists will be more likely to adopt these methods if doing so entails minimal change to their current methodology. Toward that end, we present a general and modular method for privatizing Bayesian inference for Poisson factorization, a broad class of models that contains some of the most widely used models in the social sciences. Our method satisfies local differential privacy, which ensures that no single centralized server need ever store the non-privatized data. To formulate our local-privacy guarantees, we introduce and focus on limited-precision local privacy---the local privacy analog of limited-precision differential privacy (Flood et al., 2013). We present two case studies, one involving social networks and one involving text corpora, that test our method's ability to form the posterior distribution over latent variables under different levels of noise, and demonstrate our method's utility over a na\"{i}ve approach, wherein inference proceeds as usual, treating the privatized data as if it were not privatized.

[1]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[2]  John F. Canny,et al.  GaP: a factor model for discrete data , 2004, SIGIR '04.

[3]  J. G. Skellam The frequency distribution of the difference between two Poisson variates belonging to different populations. , 1946, Journal of the Royal Statistical Society. Series A.

[4]  David M. Blei,et al.  Dynamic Poisson Factorization , 2015, RecSys.

[5]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Ihler,et al.  On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis , 2016 .

[7]  Aleksandra B. Slavkovic,et al.  Differentially Private Exponential Random Graphs , 2014, Privacy in Statistical Databases.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[10]  Mingyuan Zhou,et al.  Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction , 2015, AISTATS.

[11]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[12]  J. Davis Univariate Discrete Distributions , 2006 .

[13]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[14]  Christos Dimitrakakis,et al.  Differential Privacy for Bayesian Inference through Posterior Sampling , 2017, J. Mach. Learn. Res..

[15]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: dominant markers and null alleles , 2007, Molecular ecology notes.

[16]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[17]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[18]  Michael I. Jordan,et al.  Bayesian Nonnegative Matrix Factorization with Stochastic Variational Inference , 2014, Handbook of Mixed Membership Models and Their Applications.

[19]  Christopher D. Manning,et al.  Topic Modeling for the Social Sciences , 2009 .

[20]  Christos Dimitrakakis,et al.  Robust and Private Bayesian Inference , 2013, ALT.

[21]  Joydeep Ghosh,et al.  Nonparametric Bayesian Factor Analysis for Dynamic Count Matrices , 2015, AISTATS.

[22]  Lawrence Carin,et al.  Augment-and-Conquer Negative Binomial Processes , 2012, NIPS.

[23]  Max Welling,et al.  Positive tensor factorization , 2001, Pattern Recognit. Lett..

[24]  Mingyuan Zhou,et al.  Poisson-Gamma dynamical systems , 2016, NIPS.

[25]  Jonathan Katz,et al.  Cryptography and the Economics of Supervisory Information: Balancing Transparency and Confidentiality , 2013 .

[26]  David M. Blei,et al.  Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations , 2016, ICML.

[27]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[28]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[29]  Morten Mørup,et al.  Nonparametric Bayesian modeling of complex networks: an introduction , 2013, IEEE Signal Processing Magazine.

[30]  David M. Blei,et al.  Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts , 2015, KDD.

[31]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[32]  Stephen T. Joy The Differential Privacy of Bayesian Inference , 2015 .

[33]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[34]  J. Kalbfleisch,et al.  On the Bessel Distribution and Related Problems , 2000 .

[35]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[36]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[37]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[38]  David M. Blei,et al.  Deep Exponential Families , 2014, AISTATS.

[39]  Ali Taylan Cemgil,et al.  Bayesian Inference for Nonnegative Matrix Factorisation Models , 2009, Comput. Intell. Neurosci..

[40]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[41]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[42]  J. Hoef Who Invented the Delta Method , 2012 .

[43]  Chong Wang,et al.  Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..

[44]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[45]  Tao Sun,et al.  Differentially Private Learning of Undirected Graphical Models Using Collective Graphical Models , 2017, ICML.

[46]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[47]  L. Devroye SIMULATING BESSEL RANDOM VARIABLES , 2002 .

[48]  Andreas Haeberlen,et al.  DStress: Efficient Differentially Private Computations on Distributed Data , 2017, EuroSys.

[49]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.

[50]  James R. Foulds,et al.  Private Topic Modeling , 2016, ArXiv.

[51]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[52]  Tamara G. Kolda,et al.  On Tensors, Sparsity, and Nonnegative Factorizations , 2011, SIAM J. Matrix Anal. Appl..

[53]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[54]  Michalis K. Titsias,et al.  The Infinite Gamma-Poisson Feature Model , 2007, NIPS.

[55]  Mingyuan Zhou,et al.  The Poisson Gamma Belief Network , 2015, NIPS.

[56]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[57]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Stephen E. Fienberg,et al.  Differential Privacy for Protecting Multi-dimensional Contingency Table Data: Extensions and Applications , 2012, J. Priv. Confidentiality.

[59]  James R. Foulds,et al.  On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis , 2016, UAI.

[60]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[61]  Margaret E. Roberts,et al.  The structural topic model and applied social science , 2013, ICONIP 2013.

[62]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.