Privacy-preserving data mashup

Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In this paper, we study and resolve a real-life privacy problem in a data mashup application for the financial industry in Sweden, and propose a privacy-preserving data mashup (PPMashup) algorithm to securely integrate private data from different data providers, whereas the integrated data still retains the essential information for supporting general data exploration or a specific data mining task, such as classification analysis. Experiments on real-life data suggest that our proposed method is effective for simultaneously preserving both privacy and information usefulness, and is scalable for handling large volume of data.

[1]  Josep Domingo-Ferrer,et al.  Fast Generation of Accurate Synthetic Microdata , 2004, Privacy in Statistical Databases.

[2]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[4]  Umeshwar Dayal,et al.  View Definition and Generalization for Database Integration in a Multidatabase System , 1984, IEEE Transactions on Software Engineering.

[5]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[6]  Benjamin C. M. Fung,et al.  Anonymizing sequential releases , 2006, KDD '06.

[7]  Noam Nisan,et al.  Algorithms for Selfish Agents , 1999, STACS.

[8]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[9]  Gio Wiederhold,et al.  Intelligent integration of information , 1993, SIGMOD Conference.

[10]  Chris Clifton,et al.  A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[11]  Jian Pei,et al.  Anonymity for continuous data publishing , 2008, EDBT '08.

[12]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[13]  Sushil Jajodia,et al.  Inference Problems in Multilevel Secure Database Management Systems , 2006 .

[14]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[15]  José Meseguer,et al.  Unwinding and Inference Control , 1984, 1984 IEEE Symposium on Security and Privacy.

[16]  Chris Clifton,et al.  Privacy-Preserving Distributed k-Anonymity , 2005, DBSec.

[17]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Philip S. Yu,et al.  Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.

[19]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[20]  W. Winkler,et al.  MASKING MICRODATA FILES , 1995 .

[21]  Thomas H. Hinke,et al.  Inference aggregation detection in database management systems , 1988, Proceedings. 1988 IEEE Symposium on Security and Privacy.

[22]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[23]  Martin E. Hellman,et al.  An improved algorithm for computing logarithms over GF(p) and its cryptographic significance (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[24]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[26]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[27]  Sheng Zhong,et al.  Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[28]  Sushil Jajodia,et al.  The inference problem: a survey , 2002, SKDD.

[29]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[30]  Harry S. Delugach,et al.  A Fast Algorithm for Detecting Second Paths in Database Inference Analysis , 1995, J. Comput. Secur..

[31]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[32]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[33]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[34]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[35]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[36]  John M. Abowd,et al.  New Approaches to Confidentiality Protection: Synthetic Data, Remote Access and Research Data Centers , 2004, Privacy in Statistical Databases.

[37]  Sheng Zhong,et al.  Anonymity-preserving data collection , 2005, KDD '05.

[38]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[40]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[41]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[42]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[43]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .