Cost Effective Dynamic Concept Hierarchy Generation for Preserving Privacy

Explosive growth of information in the Internet has raised threats for individual privacy. k-Anonymity and l-diversity are two known techniques proposed to address the threats. They use concept hierarchy tree (CHT)-based generalization/suppression. For a given attribute several CHTs can be constructed. An appropriate CHT is to be chosen for attribute anonymization to be effective. This paper discusses an on the fly approach for constructing CHT which can be used for generalization/suppression. Furthermore to improve anonymization the CHT can be dynamically adjusted for a given k value. Performance evaluation is done for the proposed approach and a comparative study is performed against known methods, k-member clustering anonymization and mondrian multi-dimensional algorithm using (1) improved on the fly hierarchy (IOTF) (Campan et al., 2011), (2) on the fly hierarchy (OTF) (Campan and Cooper, 2010), (3) hierarchy free (HF) (LeFevre et al., 2006), (4) predefined hierarchy (PH) (Iyengar, 2002) (5) CHU (Chu and Chiang, 1994) and (6) HAN (Han and Fu, 1994) methods. The metrics used for evaluation are (a) information loss, (b) discernibility metric, (c) normalized average equivalence size metric. Experimental results indicate that our approach is more effective and flexible and the utility is 12% better than IOTF, 16% better than OTF and CHU, 17% better than PH and 21% better than HAN methods when applied on mondrain multi-dimensional algorithm. Experiments are conducted on k-member clustering technique and it is observed that our approach improved utility 1% better than IOTF, 2% better than OTF, 3% better than CHU, 5% better than PH and 14% better than HAN methods.

[1]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[2]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Chung-Chian Hsu Extending attribute-oriented induction algorithm for major values and numeric values , 2004, Expert Syst. Appl..

[6]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[7]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[8]  Soon-Young Huh,et al.  Providing Approximate Answers Using a Knowledge Abstraction Database , 2001, J. Database Manag..

[9]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[10]  Klaus Julisch,et al.  Clustering intrusion detection alarms to support root cause analysis , 2003, TSEC.

[11]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Jiawei Han,et al.  Generalization-Based Data Mining in Object-Oriented Databases Using an Object Cube Model , 1998, Data Knowl. Eng..