A Heuristic Data Reduction Approach for Associative Classification Rule Hiding

When data are to be shared between business partners, there could be some sensitive patterns which should not be disclosed to the other parties. On the other hand, the "quality" of the data must also be preserved. This creates an interesting question: how can we maintain the shared data that are guaranteed to have the quality, and the certain types of sensitive patterns be removed or "hidden"? In this paper, we address such the problem of sensitive classification rule hiding by using data reduction approach, i.e. removing the whole selected tuples in the given dataset. We focus on a specific type of classification rules, i.e. associative classification rules. In our context, a sensitive rule is hidden when its support falls below a minimal support threshold. Meanwhile, the impact on the data quality of the dataset is represented in term of a number of false-dropped rules, and a number of ghost rules. We present a few observations on the data quality with regard to the data reduction processes. From the observations, we can represent the impact by each reduction precisely without any re-applying the classification algorithm. Subsequently, we propose a heuristic algorithm to hide the sensitive rules based on the observations. Experimental results are presented to show the effectiveness and the efficiency of the proposed algorithm.

[1]  Arbee L. P. Chen,et al.  Hiding Sensitive Association Rules with Limited Side Effects , 2007 .

[2]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[3]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  John F. Roddick,et al.  Detecting Privacy and Ethical Sensitivity in Data Mining Results , 2004, ACSC.

[5]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  George V. Moustakides,et al.  A Max-Min Approach for Hiding Frequent Itemsets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[7]  Maria E. Orlowska,et al.  Hiding Sensitive Associative Classification Rule by Data Reduction , 2007, ADMA.

[8]  Vladimir Estivill-Castro,et al.  Two New Techniques for Hiding Sensitive Itemsets and Their Empirical Evaluation , 2006, DaWaK.

[9]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[10]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.