论文信息 - Output privacy in data mining

Output privacy in data mining

Privacy has been identified as a vital requirement in designing and implementing data mining systems. In general, privacy preservation demands protecting both input and output privacy: the former refers to sanitizing the raw data itself before performing mining; while the latter refers to preventing the mining output (models or patterns) from malicious inference attacks. This article presents a systematic study on the problem of protecting output privacy in data mining, and particularly, stream mining: (i) we highlight the importance of this problem by showing that even sufficient protection of input privacy does not guarantee that of output privacy; (ii) we present a general inferencing and disclosure model that exploits the intrawindow and interwindow privacy breaches in stream mining output; (iii) we propose a light-weighted countermeasure that effectively eliminates these breaches without explicitly detecting them, while minimizing the loss of output accuracy; (iv) we further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utility-related semantics while maintaining hard privacy guarantee; (v) finally, we conduct extensive experimental evaluation over both synthetic and real data to validate the efficacy of our approach.

Ling Liu | Ting Wang | Ling Liu | Ting Wang

[1] Qi Wang,et al. On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[2] Ivan P. Fellegi,et al. On the Question of Statistical Confidentiality , 1972 .

[3] Nabil R. Adam,et al. Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[4] Jennifer Widom,et al. Models and issues in data stream systems , 2002, PODS.

[5] Dino Pedreschi,et al. Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[6] Gu Si-yang,et al. Privacy preserving association rule mining in vertically partitioned data , 2006 .

[7] Philip S. Yu,et al. Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.

[8] Gene Tsudik,et al. A Privacy-Preserving Index for Range Queries , 2004, VLDB.

[9] Richard J. Lipton,et al. Secure databases: protection against user influence , 1979, TODS.

[10] Keke Chen,et al. Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11] S. Reiss,et al. Data-swapping: A technique for disclosure control , 1982 .

[12] L. Cox. Suppression Methodology and Statistical Disclosure Control , 1980 .

[13] Charu C. Aggarwal,et al. On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[14] Henryk Wozniakowski,et al. The statistical security of a statistical database , 1984, TODS.

[15] Gultekin Özsoyoglu,et al. Statistical database design , 1981, TODS.

[16] Toon Calders. Computational complexity of itemset frequency satisfiability , 2004, PODS '04.

[17] Toon Calders,et al. Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[18] Calton Pu,et al. A General Proximity Privacy Principle , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[19] Jimeng Sun,et al. Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20] Alexandre V. Evfimievski,et al. Privacy preserving mining of association rules , 2002, Inf. Syst..

[21] Thomas Lukasiewicz,et al. Probabilistic logic programming with conditional constraints , 2001, TOCL.

[22] Stephen A. Vavasis,et al. Quadratic Programming is in NP , 1990, Inf. Process. Lett..

[23] Ashwin Machanavajjhala,et al. l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[24] Kyuseok Shim,et al. Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[25] Luke O'Connor,et al. The inclusion-exclusion principle and its applications to cryptography , 1993 .

[26] Ling Liu,et al. Butterfly: Protecting Output Privacy in Stream Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[27] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[28] Philip S. Yu,et al. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[29] Wenliang Du,et al. Deriving private information from randomized data , 2005, SIGMOD '05.

[30] Dorothy E. Denning,et al. Secure statistical databases with random sample queries , 1980, TODS.

[31] Chris Clifton,et al. When do data mining results violate privacy? , 2004, KDD.

[32] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[33] Laks V. S. Lakshmanan,et al. Preservation Of Patterns and Input-Output Privacy , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34] Arie Shoshani,et al. Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[35] Yehuda Lindell,et al. Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[36] Gerhard Weikum,et al. ACM Transactions on Database Systems , 2005 .

[37] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[38] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39] Rakesh Agrawal,et al. Privacy-preserving data mining , 2000, SIGMOD 2000.