"Secure" Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases

Privacy-preserving data mining (PPDM) techniques aim to construct efficient data mining algorithms while main- taining privacy. Statistical disclosure limitation (SDL) tech- niques aim to preserve confidentiality but in contrast to PPDM techniques also aim to provide access to statistical data needed for "full" statistical analysis. We draw from both PPDM and SDL paradigms, and address the prob- lem of performing a "secure" logistic regression on pooled data collected separately by several parties without directly combining their databases. We describe "secure" Newton- Raphson protocol for binary logistic regression in the case of horizontally and vertically partitioned databases using secure-mulity party computation.

[1]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[2]  Francesco Bonchi,et al.  On closed constrained frequent pattern mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[5]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[6]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[7]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[8]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[9]  Jerome P. Reiter,et al.  Secure computation with horizontally partitioned data using adaptive regression splines , 2007, Comput. Stat. Data Anal..

[10]  Josh Benaloh,et al.  Secret Sharing Homomorphisms: Keeping Shares of A Secret Sharing , 1986, CRYPTO.

[11]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[12]  A. Agresti Categorical data analysis , 1993 .

[13]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[14]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Laks V. S. Lakshmanan,et al.  Exploiting succinct constraints using FP-trees , 2002, SKDD.

[16]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[17]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[18]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[19]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[20]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[21]  Shafi Goldwasser,et al.  Multi party computations: past and present , 1997, PODC '97.

[22]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[23]  Heikki Mannila,et al.  OSSM: a segmentation approach to optimize frequency counting , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[25]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[26]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[27]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[28]  Raymond Chi-Wing Wong,et al.  Mining top-K frequent itemsets from data streams , 2006, Data Mining and Knowledge Discovery.

[29]  Jerome P. Reiter,et al.  Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products , 2009 .

[30]  Laks V. S. Lakshmanan,et al.  Efficient dynamic mining of constrained frequent sets , 2003, TODS.

[31]  Stephen E. Fienberg,et al.  "Secure" Log-Linear and Logistic Regression Analysis of Distributed Databases , 2006, Privacy in Statistical Databases.

[32]  Xiaodong Lin,et al.  Secure, Privacy-Preserving Analysis of Distributed Databases , 2007, Technometrics.

[33]  J. Chimka Categorical Data Analysis, Second Edition , 2003 .

[34]  Wenliang Du,et al.  A practical approach to solve Secure Multi-party Computation problems , 2002, NSPW '02.

[35]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[36]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[37]  Carson Kai-Sang Leung,et al.  CanTree: a tree structure for efficient incremental mining of frequent patterns , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).