论文信息 - Privacy-preserving data mashup

Privacy-preserving data mashup

Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In this paper, we study and resolve a real-life privacy problem in a data mashup application for the financial industry in Sweden, and propose a privacy-preserving data mashup (PPMashup) algorithm to securely integrate private data from different data providers, whereas the integrated data still retains the essential information for supporting general data exploration or a specific data mining task, such as classification analysis. Experiments on real-life data suggest that our proposed method is effective for simultaneously preserving both privacy and information usefulness, and is scalable for handling large volume of data.

[1] Josep Domingo-Ferrer,et al. Fast Generation of Accurate Synthetic Microdata , 2004, Privacy in Statistical Databases.

[2] Latanya Sweeney,et al. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3] Andrew Chi-Chih Yao,et al. Protocols for secure computations , 1982, FOCS 1982.

[4] Umeshwar Dayal,et al. View Definition and Generalization for Database Integration in a Multidatabase System , 1984, IEEE Transactions on Software Engineering.

[5] Chris Clifton,et al. Tools for privacy preserving distributed data mining , 2002, SKDD.

[6] Benjamin C. M. Fung,et al. Anonymizing sequential releases , 2006, KDD '06.

[7] Noam Nisan,et al. Algorithms for Selfish Agents , 1999, STACS.

[8] Ramakrishnan Srikant,et al. Privacy-preserving data mining , 2000, SIGMOD '00.

[9] Gio Wiederhold,et al. Intelligent integration of information , 1993, SIGMOD Conference.

[10] Chris Clifton,et al. A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[11] Jian Pei,et al. Anonymity for continuous data publishing , 2008, EDBT '08.

[12] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[13] Sushil Jajodia,et al. Inference Problems in Multilevel Secure Database Management Systems , 2006 .

[14] Alexandre V. Evfimievski,et al. Information sharing across private databases , 2003, SIGMOD '03.

[15] José Meseguer,et al. Unwinding and Inference Control , 1984, 1984 IEEE Symposium on Security and Privacy.

[16] Chris Clifton,et al. Privacy-Preserving Distributed k-Anonymity , 2005, DBSec.

[17] Roberto J. Bayardo,et al. Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18] Philip S. Yu,et al. Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.

[19] Wenliang Du,et al. Building decision tree classifier on private data , 2002 .

[20] W. Winkler,et al. MASKING MICRODATA FILES , 1995 .

[21] Thomas H. Hinke,et al. Inference aggregation detection in database management systems , 1988, Proceedings. 1988 IEEE Symposium on Security and Privacy.

[22] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[23] Martin E. Hellman,et al. An improved algorithm for computing logarithms over GF(p) and its cryptographic significance (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[24] Philip S. Yu,et al. Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25] David J. DeWitt,et al. Workload-aware anonymization , 2006, KDD '06.

[26] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[27] Sheng Zhong,et al. Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[28] Sushil Jajodia,et al. The inference problem: a survey , 2002, SKDD.

[29] Yufei Tao,et al. Anatomy: simple and effective privacy preservation , 2006, VLDB.

[30] Harry S. Delugach,et al. A Fast Algorithm for Detecting Second Paths in Database Inference Analysis , 1995, J. Comput. Secur..

[31] Chris Clifton,et al. Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[32] Philip S. Yu,et al. Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[33] Yufei Tao,et al. Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[34] Yunghsiang Sam Han,et al. Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[35] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.

[36] John M. Abowd,et al. New Approaches to Confidentiality Protection: Synthetic Data, Remote Access and Research Data Centers , 2004, Privacy in Statistical Databases.

[37] Sheng Zhong,et al. Anonymity-preserving data collection , 2005, KDD '05.

[38] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39] Yufei Tao,et al. M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[40] Dan Suciu,et al. A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[41] Raymond Chi-Wing Wong,et al. (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[42] Pierangela Samarati,et al. Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[43] Gu Si-yang,et al. Privacy preserving association rule mining in vertically partitioned data , 2006 .