Privacy preserving linear regression modeling of distributed databases

Statistical analysis is one of the important tools in data mining field. Little work has been conducted to investigate how statistical analysis could be performed when dataset are distributed among a number of data owners. Due to confidentiality or other proprietary reasons, data owners are reluctant to share data with others, while they wish to perform statistical analysis cooperatively. We address the important tradeoff between privacy and global statistical analysis such as linear regression, and present a privacy preserving linear regression model based on fully homomorphic encryption scheme.

[1]  Aapo Hyvärinen,et al.  Independent Component Analysis: Fast ICA by a fixed-point algorithm that maximizes non-Gaussianity , 2001 .

[2]  Ling Liu,et al.  A Random Rotation Perturbation Approach to Privacy Preserving Data Classification , 2005 .

[3]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[4]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[5]  Stefanos Gritzalis,et al.  Accurate and large-scale privacy-preserving data mining using the election paradigm , 2009, Data Knowl. Eng..

[6]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[7]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[8]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[9]  Sheng Zhong,et al.  Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[10]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Artak Amirbekyan,et al.  Privacy-preserving regression algorithms , 2007 .

[12]  Xintao Wu,et al.  Deriving Private Information from Arbitrarily Projected Data , 2007, PAKDD.

[13]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[14]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[15]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[16]  William E. Winkler,et al.  Multiplicative Noise for Masking Continuous Data , 2001 .

[17]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[18]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[19]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[20]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[22]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[23]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[24]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..