论文信息 - HI gh-imensional Data

HI gh-imensional Data

Existing research onprivacy-preserving datapub- lishing focuses onrelational data:inthiscontext, theobjec- tiveistoenforce privacy-preserving paradigms, suchask- anonymity andf-diversity, while minimizing theinformation loss incurred intheanonymizing process (i.e. maximize datautility). However, existing techniques adoptanindexing- orclustering- basedapproach, andworkwellforfixed-schema data,with lowdimensionality. Nevertheless, certain applications require privacy-preserving publishing oftransaction data(orbasket data), whichinvolves hundreds oreventhousands ofdimensions, rendering existing methods unusable. We propose anovelanonymization methodforsparse high- dimensional data.We employa particular representation that captures thecorrelation intheunderlying data, andfacilitates theformation ofanonymized groupswithlowinformation loss. We propose anefficient anonymization algorithm basedonthis representation. We showexperimentally, using real-life datasets, thatourmethodclearly outperforms existing state-of-the-art in termsofbothdatautility andcomputational overhead.

Yufei Tao | Gabriel Ghinita

[1] Christos H. Papadimitriou,et al. The NP-Completeness of the bandwidth minimization problem , 1976, Computing.

[2] Daniel Kifer,et al. Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[3] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4] Kyriakos Mouratidis,et al. Preventing Location-Based Identity Inference in Anonymous Spatial Queries , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5] David J. DeWitt,et al. Workload-aware anonymization , 2006, KDD '06.

[6] David J. DeWitt,et al. Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[7] Elisa Bertino,et al. Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8] Dino Pedreschi,et al. Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[9] Qing Zhang,et al. Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.