Improved Differentially Private Analysis of Variance

Abstract Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the F -statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic F1 with much higher statistical power. We show how to rigorously compute a reference distribution for the F1 statistic and give an algorithm that outputs accurate p-values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect.

[1]  Bolin Ding,et al.  Comparing Population Means under Local Differential Privacy: with Significance and Power , 2018, AAAI.

[2]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[3]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[4]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[5]  Emanuel Schmider,et al.  Is It Really Robust , 2010 .

[6]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[7]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[8]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[9]  Anna M. Ritz,et al.  Differentially Private ANOVA Testing , 2017, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[10]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[11]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[12]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[13]  Siu Cheung Hui,et al.  Differentially Private Regression for Discrete-Time Survival Analysis , 2017, CIKM.

[14]  Daniel Kifer,et al.  A New Class of Private Chi-Square Hypothesis Tests , 2017, AISTATS.

[15]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[16]  M. Kendall Theoretical Statistics , 1956, Nature.

[17]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[18]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[19]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[20]  Ashwin Machanavajjhala,et al.  Differentially Private Significance Tests for Regression Coefficients , 2017, Journal of Computational and Graphical Statistics.

[21]  Vito D'Orazio,et al.  Differential Privacy for Social Science Inference , 2015 .

[22]  Eftychia Solea,et al.  Differentially Private Hypothesis Testing For Normal Random Variables. , 2014 .

[23]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.