Model Diagnostics for Remote Access Regression Servers

To protect public-use microdata, one approach is not to allow users access to the microdata. Instead, users submit analyses to a remote computer that reports back basic output from the fitted model, such as coefficients and standard errors. To be most useful, this remote server also should provide some way for users to check the fit of their models, without disclosing actual data values. This paper discusses regression diagnostics for remote servers. The proposal is to release synthetic diagnostics—i.e. simulated values of residuals and dependent and independent variables–constructed to mimic the relationships among the real-data residuals and independent variables. Using simulations, it is shown that the proposed synthetic diagnostics can reveal model inadequacies without substantial increase in the risk of disclosures. This approach also can be used to develop remote server diagnostics for generalized linear models.

[1]  R. Tapia,et al.  Nonparametric Probability Density Estimation , 1978 .

[2]  Jerome P. Reiter,et al.  Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study , 2005 .

[3]  E. Wegman Nonparametric probability density estimation , 1972 .

[4]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[5]  Barry Schouten,et al.  Remote access systems for statistical analysis of microdata , 2003, Stat. Comput..

[6]  E. Wegman Nonparametric Probability Density Estimation: I. A Summary of Available Methods , 1972 .

[7]  Jerome P. Reiter,et al.  Satisfying Disclosure Restrictions With Synthetic Data Sets , 2002 .

[8]  Silvia Polettini,et al.  Maximum entropy simulation for microdata protection , 2003, Stat. Comput..

[9]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[10]  George T. Duncan,et al.  Optimal Disclosure Limitation Strategy in Statistical Databases: Deterring Tracker Attacks through Additive Noise , 2000 .

[11]  Jerome P. Reiter,et al.  Multiple Imputation for Statistical Disclosure Limitation , 2003 .

[12]  Simon D. Woodcock,et al.  Disclosure Limitation in Longitudinal Linked Data , 2002 .

[13]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[14]  A. Kennickell Multiple Imputation and Disclosure Protection : TheCase of the 1995 Survey of Consumer Finances , 2000 .

[15]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[16]  Rathindra Sarathy,et al.  A rejoinder to the comments by Polettini and Stander , 2003, Stat. Comput..

[17]  Luisa Franconi,et al.  Spatial and non-spatial model-based protection procedures for the release of business microdata , 2003, Stat. Comput..

[18]  Rathindra Sarathy,et al.  A theoretical basis for perturbation methods , 2003, Stat. Comput..

[19]  Stephen E. Fienberg,et al.  Disclosure limitation using perturbation and related methods for categorical data , 1998 .

[20]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .