A privacy-preserving approach for multimodal transaction data integrated analysis

Multimodal transaction data mining has received a great deal of attention recently. Protection of private information is an essential requirement of data analysis. Existing work on privacy protection for transaction data usually focus on a single mode dataset. The existing privacy-preserving methods cannot be used directly to address privacy issues for multimodal data integration, since information leakage may be caused by data correlations among multiple heterogeneous datasets. In this work, we address privacy protection on the integration of transaction data and trajectory data. We first demonstrate a privacy leakage model caused by integration of multimodal datasets, where integrated data are modeled as a tree. To address the identity disclosure of trajectories, we partition location sequences to meet privacy demands, and copy locations to offset information loss caused by partition; then, to deal with the sensitive item disclosure of transactions, we use suppression technique to eliminate sensitive association rules. Consequently, we propose a km-anonymity--uncertainty privacy model to protect the privacy information in integrating transaction data with trajectory data in a tree-structured data model. Finally, we perform experiments on two synthetic integration datasets, and analyze privacy and information loss under varying parameters.

[1]  Aris Gkoulalas-Divanis,et al.  Efficient and flexible anonymization of transaction data , 2012, Knowledge and Information Systems.

[2]  Li Xiong,et al.  Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers , 2009, DBSec.

[3]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[4]  Nikos Mamoulis,et al.  Privacy Preservation in the Publication of Trajectories , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[5]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[6]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Li Xiong,et al.  Towards privacy-preserving integration of distributed heterogeneous data , 2008, PIKM '08.

[8]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[9]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.

[10]  Benjamin C. M. Fung,et al.  Anonymity meets game theory: secure data integration with malicious participants , 2011, The VLDB Journal.

[11]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[12]  Aris Gkoulalas-Divanis,et al.  Anonymizing Transaction Data to Eliminate Sensitive Inferences , 2010, DEXA.

[13]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[14]  Jianhua Shao,et al.  Semantic Attack on Anonymised Transactions , 2015, Trans. Large Scale Data Knowl. Centered Syst..

[16]  Spiros Skiadopoulos,et al.  Apriori-based algorithms for km-anonymizing trajectory data , 2014, Trans. Data Priv..

[17]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[18]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[19]  Benjamin C. M. Fung,et al.  Privacy-preserving trajectory data publishing by local suppression , 2013, Inf. Sci..

[20]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[21]  Thomas Brinkhoff,et al.  A Framework for Generating Network-Based Moving Objects , 2002, GeoInformatica.

[22]  Jimeng Sun,et al.  Publishing data from electronic health records while preserving privacy: A survey of algorithms , 2014, J. Biomed. Informatics.