HI gh-imensional Data

Existing research onprivacy-preserving datapub- lishing focuses onrelational data:inthiscontext, theobjec- tiveistoenforce privacy-preserving paradigms, suchask- anonymity andf-diversity, while minimizing theinformation loss incurred intheanonymizing process (i.e. maximize datautility). However, existing techniques adoptanindexing- orclustering- basedapproach, andworkwellforfixed-schema data,with lowdimensionality. Nevertheless, certain applications require privacy-preserving publishing oftransaction data(orbasket data), whichinvolves hundreds oreventhousands ofdimensions, rendering existing methods unusable. We propose anovelanonymization methodforsparse high- dimensional data.We employa particular representation that captures thecorrelation intheunderlying data, andfacilitates theformation ofanonymized groupswithlowinformation loss. We propose anefficient anonymization algorithm basedonthis representation. We showexperimentally, using real-life datasets, thatourmethodclearly outperforms existing state-of-the-art in termsofbothdatautility andcomputational overhead.

[1]  Christos H. Papadimitriou,et al.  The NP-Completeness of the bandwidth minimization problem , 1976, Computing.

[2]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[3]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Kyriakos Mouratidis,et al.  Preventing Location-Based Identity Inference in Anonymous Spatial Queries , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[6]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[7]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[9]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.