Secure Statistical Analysis of Distributed Databases, Emphasizing What We Don't Know

Over the past several years, the National Institute of Statistical Sciences (NISS) has developed methodology to perform statistical analyses that, in effect, integrate data in multiple, distributed databases, but without literally bringing the data together in one place. In this paper, we summarize that research, but focus on issues that are not understood. These include inability to perform exploratory analyses and visualizations, protections against dishonest participants, inequities between database owners and lack of measures of risk and utility.

[1]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[2]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[3]  Jerome P. Reiter,et al.  Data Dissemination and Disclosure Limitation in a World Without Microdata: A Risk-Utility Framework for Remote Access Analysis Servers , 2005 .

[4]  Josh Benaloh,et al.  Secret Sharing Homomorphisms: Keeping Shares of A Secret Sharing , 1986, CRYPTO.

[5]  Jerome P. Reiter,et al.  Model Diagnostics for Remote Access Regression Servers , 2003, Stat. Comput..

[6]  Jerome P. Reiter,et al.  Secure Regression for Vertically Partitioned , Partially Overlapping Data , 2004 .

[7]  Anna Oganian,et al.  A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality , 2006 .

[8]  David L. Banks,et al.  Data quality: A statistical perspective , 2006 .

[9]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[10]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[11]  A. Karr,et al.  Data swapping as a decision problem , 2005 .

[12]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[13]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[14]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[15]  Wenliang Du,et al.  A practical approach to solve Secure Multi-party Computation problems , 2002, NSPW '02.

[16]  A. Beaton THE USE OF SPECIAL MATRIX OPERATORS IN STATISTICAL CALCULUS , 1964 .

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[19]  Shafi Goldwasser,et al.  Multi party computations: past and present , 1997, PODC '97.

[20]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[21]  Xiaodong Lin,et al.  Secure analysis of distributed chemical databases without data integration , 2005, J. Comput. Aided Mol. Des..

[22]  Xiaodong Lin,et al.  Secure, Privacy-Preserving Analysis of Distributed Databases , 2007, Technometrics.

[23]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[24]  Anna Oganian,et al.  Global Measures of Data Utility for Microdata Masked for Disclosure Limitation , 2009, J. Priv. Confidentiality.

[25]  Xiaodong Lin,et al.  Analysis of Integrated Data without Data Integration , 2004 .