Privacy Preserving Outlier Detection Using Hierarchical Clustering Methods

Data objects which do not comply with the general behavior or model of the data are called Outliers. Outlier Detection in databases has numerous applications such as fraud detection, customized marketing, and the search for terrorism. However, the use of Outlier Detection for various purposes has raised concerns about the violation of individual privacy. Therefore, Privacy Preserving Outlier Detection must ensure that privacy concerns are addressed and balanced, so that the data analyst can get the benefits of outlier detection without being thwarted by legal counter-measures by privacy advocates. In this paper, we propose a technique for detecting outliers while preserving privacy, using hierarchical clustering methods. We analyze our technique to quantify the privacy preserved by this method and also prove that reverse engineering the perturbed data is extremely difficult.

[1]  D.V.L.N. Somayajulu,et al.  Privacy Preserving Clustering by Cluster Bulging for Information Sustenance , 2008, 2008 4th International Conference on Information and Automation for Sustainability.

[2]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[3]  Tian-yang Lv,et al.  An Auto-stopped Hierarchical Clustering Algorithm Integrating Outlier Detection Algorithm , 2005, WAIM.

[4]  A. M. Natarajan,et al.  An Effective Data Transformation Approach for Privacy Preserving Clustering , 2008 .

[5]  Carlos Soares,et al.  Outlier Detection using Clustering Methods: a data cleaning application , 2004 .

[6]  Gerald Neary,et al.  Privacy and Electronic Commerce , 2001 .

[7]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[8]  Sheng-yi Jiang,et al.  Clustering-Based Outlier Detection Method , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Douglas M. Blough,et al.  Privacy preserving data obfuscation for inherently clustered data , 2008, Int. J. Inf. Comput. Secur..

[10]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[11]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[12]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[13]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[14]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[15]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[16]  Aldric Hama Superfreakonomics: Global Cooling, Patriotic Prostitutes and Why Suicide Bombers Should Buy Life Insurance , 2010 .

[17]  Chris Clifton,et al.  Privacy-preserving outlier detection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[19]  Ajay Challagalla,et al.  Privacy preservation in k-means clustering by cluster rotation , 2009, TENCON 2009 - 2009 IEEE Region 10 Conference.

[20]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[21]  Elisa Bertino,et al.  A Survey of Quantification of Privacy Preserving Data Mining Algorithms , 2008, Privacy-Preserving Data Mining.

[22]  Steven D. Levitt,et al.  SuperFreakonomics: Global Cooling, Patriotic Prostitutes, and Why Suicide Bombers Should Buy Life Insurance , 2009 .

[23]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.