Privately detecting changes in unknown distributions

The change-point detection problem seeks to identify distributional changes in streams of data. Increasingly, tools for change-point detection are applied in settings where data may be highly sensitive and formal privacy guarantees are required, such as identifying disease outbreaks based on hospital records, or IoT devices detecting activity within a home. Differential privacy has emerged as a powerful technique for enabling data analysis while preventing information leakage about individuals. Much of the prior work on change-point detection---including the only private algorithms for this problem---requires complete knowledge of the pre-change and post-change distributions. However, this assumption is not realistic for many practical applications of interest. This work develops differentially private algorithms for solving the change-point problem when the data distributions are unknown. Additionally, the data may be sampled from distributions that change smoothly over time, rather than fixed pre-change and post-change distributions. We apply our algorithms to detect changes in the linear trends of such data streams. Finally, we also provide experimental results to empirically validate the performance of our algorithms.

[1]  Hock Peng Chan,et al.  Optimal sequential detection in multi-stream data , 2015, 1506.08504.

[2]  B. S. Darkhovskh A Nonparametric Method for the a Posteriori Detection of the “Disorder” Time of a Sequence of Independent Random Variables , 1976 .

[3]  Nagi Gebraeel,et al.  Multi-sensor slope change detection , 2015, Ann. Oper. Res..

[4]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[5]  E. Carlstein Nonparametric Change-Point Estimation , 1988 .

[6]  Y. Mei Sequential change-point detection when unknown parameters are present in the pre-change distribution , 2006, math/0605322.

[7]  S. W. Roberts A Comparison of Some Control Chart Procedures , 1966 .

[8]  G. Lorden PROCEDURES FOR REACTING TO A CHANGE IN DISTRIBUTION , 1971 .

[9]  Adrian Bowman,et al.  On the use of nonparametric regression for model checking , 1989 .

[10]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[11]  A. Shiryaev On Optimum Methods in Quickest Detection Problems , 1963 .

[12]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[13]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[14]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[15]  Adam Groce,et al.  A Differentially Private Wilcoxon Signed-Rank Test , 2018, ArXiv.

[16]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[17]  M. Pollak Average Run Lengths of an Optimal Method of Detecting a Change in Distribution. , 1987 .

[18]  A. R. Crathorne,et al.  Economic Control of Quality of Manufactured Product. , 1933 .

[19]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[20]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[21]  Yajun Mei,et al.  Differentially Private Change-Point Detection , 2018, NeurIPS.

[22]  Richard A. Johnson,et al.  Nonparametric Tests for Shift at an Unknown Time Point , 1968 .

[23]  Adam Groce,et al.  Differentially Private Nonparametric Hypothesis Testing , 2019, CCS.

[24]  G. Moustakides Optimal stopping times for detecting changes in distributions , 1986 .

[25]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[26]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[27]  Y. Mei Efficient scalable schemes for monitoring a large number of data streams , 2010 .

[28]  J. Wolfowitz,et al.  On a Test Whether Two Samples are from the Same Population , 1940 .

[29]  Yajun Mei,et al.  Is Average Run Length to False Alarm Always an Informative Criterion? , 2008 .

[30]  M. Pollak Optimal Detection of a Change in Distribution , 1985 .

[31]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[32]  Robert Lund,et al.  Detection of Undocumented Changepoints: A Revision of the Two-Phase Regression Model , 2002 .

[33]  T. Lai SEQUENTIAL ANALYSIS: SOME CLASSICAL PROBLEMS AND NEW CHALLENGES , 2001 .

[34]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[35]  P. Perron,et al.  Computation and Analysis of Multiple Structural-Change Models , 1998 .

[36]  T. Lai Sequential changepoint detection in quality control and dynamical systems , 1995 .

[37]  Adam D. Smith,et al.  The structure of optimal private tests for simple hypotheses , 2018, STOC.

[38]  David Siegmund,et al.  MODEL SELECTION FOR HIGH-DIMENSIONAL, MULTI-SEQUENCE CHANGE-POINT PROBLEMS , 2012 .

[39]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[40]  Y. Mei Efficient scalable schemes for monitoring a large number of data streams , 2010 .

[41]  M. Kulldor,et al.  Prospective time-periodic geographical disease surveillance using a scan statistic , 2001 .