Utility Aware Clustering for Publishing Transactional Data

This work aims to maximise the utility of published data for the partition-based anonymisation of transactional data. We make an observation that, by optimising the clustering i.e. horizontal partitioning, the utility of published data can significantly be improved without affecting the privacy guarantees. We present a new clustering method with a specially designed distance function that considers the effect of sensitive terms in the privacy goal as part of the clustering process. In this way, when the clustering minimises the total intra-cluster distances of the partition, the utility loss is also minimised. We present two algorithms DocClust and DetK for clustering transactions and determining the best number of clusters respectively.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Ke Wang,et al.  Anonymizing bag-valued sparse data by semantic similarity-based clustering , 2013, Knowledge and Information Systems.

[3]  John Liagouris,et al.  Utility-Constrained Electronic Health Record Data Publishing Through Generalization and Disassociation , 2015, Medical Data Privacy Handbook.

[4]  Panos Kalnis,et al.  Anonymous Publication of Sensitive Transactional Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[5]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Nikos Mamoulis,et al.  Privacy Preservation by Disassociation , 2012, Proc. VLDB Endow..

[7]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[8]  John Liagouris,et al.  Disassociation for electronic health record privacy , 2014, J. Biomed. Informatics.

[9]  Tamir Tassa,et al.  Efficient Anonymizations with Enhanced Utility , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[10]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[11]  Panos Kalnis,et al.  Local and global recoding methods for anonymizing set-valued data , 2010, The VLDB Journal.

[12]  Hua Zhu,et al.  Achieving k -Anonymity Via a Density-Based Clustering Method , 2007, APWeb/WAIM.