A Differentially Private Wilcoxon Signed-Rank Test

Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we present a differentially private analogue of the classic Wilcoxon signed-rank hypothesis test, which is used when comparing sets of paired (e.g., before-and-after) data values. We present not only a private estimate of the test statistic, but a method to accurately compute a p-value and assess statistical significance. We evaluate our test on both simulated and real data. Compared to the only existing private test for this situation, that of Task and Clifton, we find that our test requires less than half as much data to achieve the same statistical power.

[1]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[2]  Daniel Kifer,et al.  A New Class of Private Chi-Square Hypothesis Tests , 2017, AISTATS.

[3]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[4]  Yue Wang,et al.  Differentially Private Hypothesis Testing, Revisited , 2015, ArXiv.

[5]  Ashwin Machanavajjhala,et al.  Differentially Private Algorithms for Empirical Machine Learning , 2014, ArXiv.

[6]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[7]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[8]  Chris Clifton,et al.  Differentially Private Significance Testing on Paired-Sample Data , 2016, SDM.

[9]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[10]  J. Pratt Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures , 1959 .

[11]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[12]  Eftychia Solea,et al.  Differentially Private Hypothesis Testing For Normal Random Variables. , 2014 .

[13]  Bolin Ding,et al.  Comparing Population Means under Local Differential Privacy: with Significance and Power , 2018, AAAI.

[14]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[15]  Ashwin Machanavajjhala,et al.  Differentially Private Significance Tests for Regression Coefficients , 2017, Journal of Computational and Graphical Statistics.

[16]  W. J. Conover,et al.  On Methods of Handling Ties in the Wilcoxon Signed-Rank Test , 1973 .

[17]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[18]  Stephen E. Fienberg,et al.  Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases , 2014, Privacy in Statistical Databases.

[19]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[20]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[21]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[22]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  Siu Cheung Hui,et al.  Differentially Private Regression for Discrete-Time Survival Analysis , 2017, CIKM.

[24]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[25]  Vito D'Orazio,et al.  Differential Privacy for Social Science Inference , 2015 .

[26]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[27]  Anna M. Ritz,et al.  Differentially Private ANOVA Testing , 2017, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[28]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[29]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.