Ontology-Enhanced Interactive Anonymization in Domain-Driven Data Mining Outsourcing

This paper focuses on a domain-driven data mining outsourcing scenario whereby a data owner publishes data to an application service provider who returns mining results. To ensure data privacy against an un-trusted party, anonymization, a widely used technique capable of preserving true attribute values and supporting various data mining algorithms is required. Several issues emerge when anonymization is applied in a real world outsourcing scenario. The majority of methods have focused on the traditional data mining paradigm, therefore they do not implement domain knowledge nor optimize data for domain-driven usage. Furthermore, existing techniques are mostly non-interactive in nature, providing little control to users while assuming their natural capability of producing Domain Generalization Hierarchies (DGH). Moreover, previous utility metrics have not considered attribute correlations during generalization. To successfully obtain optimal data privacy and actionable patterns in a real world setting, these concerns need to be addressed. This paper proposes an anonymization framework for aiding users in a domain-driven data mining outsourcing scenario. The framework involves several components designed to anonymize data while preserving meaningful or actionable patterns that can be discovered after mining. In contrast with existing works for traditional data-mining, this framework integrates domain ontology knowledge during DGH creation to retain value meanings after anonymization. In addition, users can implement constraints based on their mining tasks thereby controlling how data generalization is performed. Finally, attribute correlations are calculated to ensure preservation of important features. Preliminary experiments show that an ontology-based DGH manages to preserve semantic meaning after attribute generalization. Also, using Chi-Square as a correlation measure can possibly improve attribute selection before generalization.

[1]  Johannes Gehrke,et al.  Interactive anonymization of sensitive data , 2009, SIGMOD Conference.

[2]  Chengqi Zhang,et al.  Domain-Driven Data Mining: A Practical Methodology , 2006, Int. J. Data Warehous. Min..

[3]  Longbing Cao,et al.  Domain-Driven Data Mining: Challenges and Prospects , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  S. Bressan,et al.  Towards a Privacy Diagnosis Centre: Measuring k-Anonymity , 2008, International Symposium on Computer Science and its Applications.

[5]  Chengqi Zhang,et al.  Domain-driven in-depth pattern discovery: A practical methodology , 2005 .

[6]  Matt Bishop,et al.  Privacy aware data sharing: balancing the usability and privacy of datasets , 2009, PETRA '09.

[7]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Feng Li,et al.  Challenging More Updates: Towards Anonymous Re-publication of Fully Dynamic Datasets , 2008, ArXiv.

[9]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[10]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[11]  Slava Kisilevich,et al.  kACTUS 2: Privacy Preserving in Classification Tasks Using k-Anonymity , 2008, ISIPS.

[12]  Grigorios Loukides,et al.  Towards Preference-Constrained k-Anonymisation , 2009, DASFAA Workshops.

[13]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jan Rauch,et al.  Roles of Medical Ontology in Association Mining CRISP-DM Cycle , 2004 .

[15]  Nikos Mamoulis,et al.  An Audit Environment for Outsourcing of Frequent Itemset Mining , 2009, Proc. VLDB Endow..

[16]  Ting Wang,et al.  A Semantic Information Loss Metric for Privacy Preserving Publication , 2010, DASFAA.

[17]  Grigorios Loukides,et al.  An Empirical Study of Utility Measures for k-Anonymisation , 2008, BNCOD.

[18]  T. Truta,et al.  Constrained k-Anonymity : Privacy with Generalization Boundaries , 2022 .

[19]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[20]  Li Xiong,et al.  Towards Application-Oriented Data Anonymization , 2008 .

[21]  Chengqi Zhang,et al.  The Evolution of KDD: towards Domain-Driven Data Mining , 2007, Int. J. Pattern Recognit. Artif. Intell..

[22]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[23]  Xue Li,et al.  Data Quality in Privacy Preservation for Associative Classification , 2008, ADMA.

[24]  Jan Rauch,et al.  Ontology-Enhanced Association Mining , 2005, EWMF/KDO.

[25]  Nikos Mamoulis,et al.  Security in Outsourcing of Association Rule Mining , 2007, VLDB.

[26]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[28]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[29]  Liz Sonenberg,et al.  Domain ontology driven data mining: a medical case study , 2007, DDDM '07.

[30]  Sergej Evdokimov,et al.  Secure outsourcing of IT services in a non-trusted environment , 2008 .

[31]  Claus Boyens,et al.  Privacy trade-offs in web-based services , 2005 .

[32]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[33]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[34]  Y. S. Kumaraswamy,et al.  Extraction of Significant Patterns from Heart Disease Warehouses for Heart Attack Prediction , 2009 .

[35]  Lei Zou,et al.  ɛ-inclusion: privacy preserving re-publication of dynamic datasets , 2008 .

[36]  Yücel Saygin,et al.  Privacy Preserving Data Mining Services on the Web , 2005, TrustBus.

[37]  David F. Lobach,et al.  Medical data mining: knowledge discovery in a clinical data warehouse , 1997, AMIA.