Sensitive Outlier Protection in Privacy Preserving Data Mining

Data mining is the extraction of hidden predictive information from large databases and also a powerful new technology with great potential to analyze important information in their data warehouses. Privacy preserving data mining is a latest research area in the field of data mining which generally deals with the side effects of the data mining techniques. Privacy is defined as “protecting individual’s information”. Protection of privacy has become an important issue in data mining research. Sensitive outlier protection is novel research in the data mining research field. Clustering is a division of data into groups of similar objects. One of the main tasks in data mining research is Outlier Detection. In data mining, clustering algorithms are used for detecting the outliers efficiently. In this paper we have used four clustering algorithms to detect outliers and also proposed a new privacy technique GAUSSIAN PERTURBATION RANDOM METHOD to protect the sensitive outliers in health data sets.

[1]  P. Murugavel,et al.  Improved Hybrid Clustering and Distance-based Technique for Outlier Removal , 2011 .

[2]  Latanya Sweeney AI Technologies to Defeat Identity Theft Vulnerabilities , 2005, AAAI Spring Symposium: AI Technologies for Homeland Security.

[3]  Wei Jiang,et al.  On-line outlier detection and data cleaning , 2004, Comput. Chem. Eng..

[4]  Chris Clifton,et al.  Privacy preserving data mining over vertically partitioned data , 2004 .

[5]  Sheng-yi Jiang,et al.  Clustering-Based Outlier Detection Method , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[6]  Durvasula V. L. N. Somayajulu,et al.  Privacy Preserving Outlier Detection Using Hierarchical Clustering Methods , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops.

[7]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[8]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Jeffrey W. Seifert,et al.  Data Mining: An Overview , 2004 .

[11]  Shubha U. Nabar Models and algorithms for privacy-preserving data mining , 2008 .

[12]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[13]  Ran Wolff,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Providing k-Anonymity in Data Mining , 2022 .

[14]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[15]  M. Farooqi DATA MINING : AN OVERVIEW , 2012 .

[16]  Diane J. Cook,et al.  Approximate Association Rule Mining , 2001, FLAIRS Conference.

[17]  Moh'd Belal Al-Zoubi,et al.  New outlier detection method based on fuzzy clustering , 2010 .

[18]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[19]  Elisa Bertino,et al.  A Survey of Quantification of Privacy Preserving Data Mining Algorithms , 2008, Privacy-Preserving Data Mining.

[20]  Carlos Soares,et al.  Outlier Detection using Clustering Methods: a data cleaning application , 2004 .

[21]  E. Poovammal,et al.  An Improved Method for Privacy Preserving Data Mining , 2009, 2009 IEEE International Advance Computing Conference.

[22]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[23]  Mohamed A. Ismail,et al.  Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering , 2009 .