Differentially Private Nonparametric Hypothesis Testing

Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we study differentially private tests of independence between a categorical and a continuous variable. We take as our starting point traditional nonparametric tests, which require no distributional assumption (e.g., normality) about the data distribution. We present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon signed-rank tests, as well as the parametric one-sample t-test. These tests use novel test statistics developed specifically for the private setting. We compare our tests to prior work, both on parametric and nonparametric tests. We find that in all cases our new nonparametric tests achieve large improvements in statistical power, even when the assumptions of parametric tests are met.

[1]  Daniel Kifer,et al.  A New Class of Private Chi-Square Hypothesis Tests , 2017, AISTATS.

[2]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[3]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[4]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[5]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[6]  W. J. Conover,et al.  On Methods of Handling Ties in the Wilcoxon Signed-Rank Test , 1973 .

[7]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[8]  Aleksandra B. Slavkovic,et al.  Differentially Private Uniformly Most Powerful Tests for Binomial Data , 2018, NeurIPS.

[9]  Morten W Fagerland,et al.  Parametric methods outperformed non-parametric methods in comparisons of discrete numerical variables , 2011, BMC medical research methodology.

[10]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[11]  Andrew Bray,et al.  Improved Differentially Private Analysis of Variance , 2019, Proc. Priv. Enhancing Technol..

[12]  Marco Gaboardi,et al.  Locally Private Mean Estimation: Z-test and Tight Confidence Intervals , 2018, AISTATS.

[13]  Anna M. Ritz,et al.  Differentially Private ANOVA Testing , 2017, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[14]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[15]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[16]  J. Bland,et al.  The tyranny of power: is there a better way to calculate sample size? , 2009, BMJ : British Medical Journal.

[17]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[18]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[19]  Chris Clifton,et al.  Differentially Private Significance Testing on Paired-Sample Data , 2016, SDM.

[20]  Bolin Ding,et al.  Comparing Population Means under Local Differential Privacy: with Significance and Power , 2018, AAAI.

[21]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[22]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[23]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[24]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[25]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[26]  J. Pratt Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures , 1959 .

[27]  Emily Jackson,et al.  Determination of medical abortion eligibility by women and community health volunteers in Nepal: A toolkit evaluation , 2017, PloS one.

[28]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[29]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[30]  Eftychia Solea,et al.  Differentially Private Hypothesis Testing For Normal Random Variables. , 2014 .

[31]  Vito D'Orazio,et al.  Differential Privacy for Social Science Inference , 2015 .

[32]  Alain Malafosse,et al.  Increased DNA methylation status of the serotonin receptor 5HTR1A gene promoter in schizophrenia and bipolar disorder. , 2011, Journal of affective disorders.

[33]  Ashwin Machanavajjhala,et al.  Differentially Private Significance Tests for Regression Coefficients , 2017, Journal of Computational and Graphical Statistics.

[34]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[35]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[36]  Siu Cheung Hui,et al.  Differentially Private Regression for Discrete-Time Survival Analysis , 2017, CIKM.

[37]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.