Multivariate Mean Comparison under Differential Privacy

The comparison of multivariate population means is a central task of statistical inference. While statistical theory provides a variety of analysis tools, they usually do not protect individuals’ privacy. This knowledge can create incentives for participants in a study to conceal their true data (especially for outliers), which might result in a distorted analysis. In this paper we address this problem by developing a hypothesis test for multivariate mean comparisons that guarantees differential privacy to users. The test statistic is based on the popular Hotelling’s t2statistic, which has a natural interpretation in terms of the Mahalanobis distance. In order to control the type-1error, we present a bootstrap algorithm under differential privacy that provably yields a reliable test decision. In an empirical study we demonstrate the applicability of this approach.

[1]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[2]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[3]  Andrew Bray,et al.  Improved Differentially Private Analysis of Variance , 2019, Proc. Priv. Enhancing Technol..

[4]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[5]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[6]  Adam D. Smith,et al.  The structure of optimal private tests for simple hypotheses , 2018, STOC.

[7]  Parvez Ahammad,et al.  LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale , 2020, ArXiv.

[8]  Y. Sei,et al.  Privacy-preserving chi-squared test of independence for small samples , 2021, BioData Min..

[9]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[10]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[11]  Andrew Bray,et al.  Differentially Private Confidence Intervals , 2020, ArXiv.

[12]  Daniel Kifer,et al.  A New Class of Private Chi-Square Hypothesis Tests , 2017, AISTATS.

[13]  Sergei Vassilvitskii,et al.  Differentially Private Covariance Estimation , 2019, NeurIPS.

[14]  Yehuda Lindell,et al.  Secure Multiparty Computation for Privacy-Preserving Data Mining , 2009, IACR Cryptol. ePrint Arch..

[15]  Bolin Ding,et al.  Comparing Population Means under Local Differential Privacy: with Significance and Power , 2018, AAAI.

[16]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[17]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[18]  Adam Groce,et al.  Differentially Private Nonparametric Hypothesis Testing , 2019, CCS.

[19]  Daniel Sheldon,et al.  General-Purpose Differentially-Private Confidence Intervals , 2020, ArXiv.

[20]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.