论文信息 - Hiding Emerging Patterns with Local Recoding Generalization

Hiding Emerging Patterns with Local Recoding Generalization

Establishing strategic partnership often requires organizations to publish and share meaningful data to support collaborative business activities. An equally important concern for them is to protect sensitive patterns like unique emerging sales opportunities embedded in their data. In this paper, we contribute to the area of data sanitization by introducing an optimization-based local recoding methodology to hide emerging patterns from a dataset but with the underlying frequent itemsets preserved as far as possible. We propose a novel heuristic solution that captures the unique properties of hiding EPs to carry out iterative local recoding generalization. Also, we propose a metric which measures (i) frequentitemset distortion that quantifies the quality of published data and (ii) the degree of reduction in emerging patterns, to guide a bottom-up recoding process. We have implemented our proposed solution and experimentally verified its effectiveness with a benchmark dataset.

Byron Choi | William Kwok-Wai Cheung | Michael W. K. Cheng

[1] George V. Moustakides,et al. A MaxMin approach for hiding frequent itemsets , 2008, Data Knowl. Eng..

[2] Panos Kalnis,et al. Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[3] Jinyan Li,et al. CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[4] Feng Zhu,et al. On Multidimensional k-Anonymity with Local Recoding Generalization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5] Zhou Wang,et al. Exploiting Maximal Emerging Patterns for Classification , 2004, Australian Conference on Artificial Intelligence.

[6] Wenliang Du,et al. Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[7] Philip S. Yu,et al. Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[8] Philip S. Yu,et al. Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[9] Ashwin Machanavajjhala,et al. l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[10] Kotagiri Ramamohanarao,et al. Patterns Based Classifiers , 2007, World Wide Web.

[11] Jinyan Li,et al. Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[12] Huiqing Liu,et al. Discovery of significant rules for classifying cancer diagnosis data , 2003, ECCB.

[13] Nabil R. Adam,et al. Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[14] Philip S. Yu,et al. Anonymizing transaction databases for publication , 2008, KDD.

[15] Philip S. Yu,et al. A border-based approach for hiding sensitive frequent itemsets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16] Latanya Sweeney,et al. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17] Jeanne G. Harris,et al. Competing on Analytics: The New Science of Winning , 2007 .

[18] David J. DeWitt,et al. Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[19] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20] Benjamin C. M. Fung,et al. A framework for privacy-preserving cluster analysis , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[21] Adam Meyerson,et al. On the complexity of optimal K-anonymity , 2004, PODS.

[22] Charu C. Aggarwal,et al. On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[23] Alexandre V. Evfimievski,et al. Privacy preserving mining of association rules , 2002, Inf. Syst..

[24] Huiqing Liu,et al. Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[25] Chris Clifton,et al. Using unknowns to prevent discovery of association rules , 2001, SGMD.

[26] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27] Qi Wang,et al. Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[28] Mong-Li Lee,et al. A Prime Number Labeling Scheme for Dynamic Ordered XML Trees , 2004, ICDE.

[29] Jian Pei,et al. Utility-based anonymization using local recoding , 2006, KDD '06.

[30] Kotagiri Ramamohanarao,et al. A Bayesian Approach to Use Emerging Patterns for Classification , 2003, ADC.

[31] Roberto J. Bayardo,et al. Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[32] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[33] Raymond Chi-Wing Wong,et al. Anonymization by Local Recoding in Data with Attribute Hierarchical Taxonomies , 2008, IEEE Transactions on Knowledge and Data Engineering.

[34] James Bailey,et al. Fast Algorithms for Mining Emerging Patterns , 2002, PKDD.

[35] Kotagiri Ramamohanarao,et al. Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets , 2000, KDD '00.

[36] I. Kohane,et al. Public standards and patients' control: how to keep electronic medical records accessible but private. , 2001, BMJ : British Medical Journal.

[37] M.A. Bach Tobji,et al. GUFI: A New Algorithm for General Updating of Frequent Itemsets , 2008, 2008 11th IEEE International Conference on Computational Science and Engineering - Workshops.

[38] Stanley Robson de Medeiros Oliveira,et al. Privacy preserving frequent itemset mining , 2002 .

[39] Philip S. Yu,et al. Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[40] Ninghui Li,et al. On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[41] Philip S. Yu,et al. Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.