Butterfly: Protecting Output Privacy in Stream Mining

Privacy preservation in data mining demands protecting both input and output privacy. The former refers to sanitizing the raw data itself before performing mining. The latter refers to preventing the mining output (model/pattern) from malicious pattern-based inference attacks. The preservation of input privacy does not necessarily lead to that of output privacy. This work studies the problem of protecting output privacy in the context of frequent pattern mining over data streams. After exposing the privacy breaches existing in current stream mining systems, we propose Butterfly, a light-weighted countermeasure that can effectively eliminate these breaches without explicitly detecting them, meanwhile minimizing the loss of the output accuracy. We further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utility-related semantics while maintaining the hard privacy and accuracy guarantee. We conduct extensive experiments over real- life datasets to show the effectiveness and efficiency of our approach.

[1]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[3]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[4]  Gultekin Özsoyoglu,et al.  Statistical database design , 1981, TODS.

[5]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[6]  Laks V. S. Lakshmanan,et al.  Preservation Of Patterns and Input-Output Privacy , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[8]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[9]  Toon Calders Computational complexity of itemset frequency satisfiability , 2004, PODS '04.

[10]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[11]  Stephen A. Vavasis,et al.  Quadratic Programming is in NP , 1990, Inf. Process. Lett..

[12]  Ting Wang,et al.  : Output Privacy Protection in Stream Mining , 2007 .

[13]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[14]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[15]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[16]  Jimeng Sun,et al.  Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[19]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[20]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Philip S. Yu,et al.  Handicapping attacker's confidence: an alternative to k-anonymization , 2006, Knowledge and Information Systems.