"Secure" Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases

Privacy-preserving data mining (PPDM) techniques aim to construct efficient data mining algorithms while main- taining privacy. Statistical disclosure limitation (SDL) tech- niques aim to preserve confidentiality but in contrast to PPDM techniques also aim to provide access to statistical data needed for "full" statistical analysis. We draw from both PPDM and SDL paradigms, and address the prob- lem of performing a "secure" logistic regression on pooled data collected separately by several parties without directly combining their databases. We describe "secure" Newton- Raphson protocol for binary logistic regression in the case of horizontally and vertically partitioned databases using secure-mulity party computation.

[1]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[2]  Shafi Goldwasser,et al.  Multi party computations: past and present , 1997, PODC '97.

[3]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[4]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[5]  Jerome P. Reiter,et al.  Secure Regression for Vertically Partitioned , Partially Overlapping Data , 2004 .

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  Josh Benaloh,et al.  Secret Sharing Homomorphisms: Keeping Shares of A Secret Sharing , 1986, CRYPTO.

[8]  Wenliang Du,et al.  A practical approach to solve Secure Multi-party Computation problems , 2002, NSPW '02.

[9]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[10]  Jerome P. Reiter,et al.  Secure computation with horizontally partitioned data using adaptive regression splines , 2007, Comput. Stat. Data Anal..

[11]  Chris Clifton,et al.  Privacy-Preserving Data Mining , 2006, Encyclopedia of Database Systems.

[12]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[13]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[14]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[15]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[17]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[18]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[19]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[20]  Chris Clifton,et al.  Privacy-preserving distributed data mining on horizontally partitioned data , 2004 .

[21]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[22]  Stephen E. Fienberg,et al.  "Secure" Log-Linear and Logistic Regression Analysis of Distributed Databases , 2006, Privacy in Statistical Databases.

[23]  Xiaodong Lin,et al.  Secure, Privacy-Preserving Analysis of Distributed Databases , 2007, Technometrics.