A Wavelet-Based Approach to Preserve Privacy for Classification Mining

Despite the commercial success of data mining, a major drawback has been acknowledged across academic, industry, and government sectors, namely, the issue of violating the privacy of individuals. We propose a data transformation method based on wavelets to disguise private data while preserving the original classification patterns. Wavelet transformations have been used extensively in signal processing for data reduction, multiresolution analysis, and removing noise from data. In our implementation, two commonly used wavelet transforms, the Haar and the Daub-4 transforms, are tested for pattern and privacy preservation in classification mining tasks. Empirical results confirm that the Haar and the Daub-4 transforms preserve the classification patterns and preserve the privacy for real valued data.

[1]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[2]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[3]  Aryya Gangopadhyay,et al.  Information Sharing in Supply Chain Management with Demand Uncertainty , 2006 .

[4]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[5]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[7]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[8]  Fred Glover,et al.  Applications and Implementation , 1981 .

[9]  Thomas Karol Understanding Cross-Border Privacy Impact Assessments , 2001 .

[10]  Dennis Shasha,et al.  High Performance Discovery in Time Series , 2004, Monographs in Computer Science.

[11]  D. Kwiatkowski,et al.  Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research collaboration. , 2006, Bulletin of the World Health Organization.

[12]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[13]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[14]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[15]  M. Markel,et al.  Multinational data-privacy laws: an introduction for IT managers , 2004, IEEE Transactions on Professional Communication.

[16]  Hau L. Lee,et al.  Information sharing in a supply chain , 2000, Int. J. Manuf. Technol. Manag..

[17]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[18]  Jeffrey W. Seifert,et al.  Data Mining: An Overview , 2004 .

[19]  Dominic P. Kwiatkowski,et al.  Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research , 2006 .

[20]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[21]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[25]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[26]  Stanley R. M. Oliveira,et al.  Privacy-Preserving Clustering by Object Similarity-Based Representation and Dimensionality Reduction Transformation , 2004 .

[27]  Cliff T. Ragsdale,et al.  Data-Driven Classification Using Boundary Observations , 2006, Decis. Sci..

[28]  M. Austin,et al.  Ethical issues in human genome epidemiology: a case study based on the Japanese American Family Study in Seattle, Washington. , 2001, American journal of epidemiology.

[29]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[30]  Antonie Stam,et al.  FOUR APPROACHES TO THE CLASSIFICATION PROBLEM IN DISCRIMINANT ANALYSIS: AN EXPERIMENTAL STUDY* , 1988 .