JS-Reduce: Defending Your Data from Sequential Background Knowledge Attacks

Web queries, credit card transactions, and medical records are examples of transaction data flowing in corporate data stores, and often revealing associations between individuals and sensitive information. The serial release of these data to partner institutions or data analysis centers in a nonaggregated form is a common situation. In this paper, we show that correlations among sensitive values associated to the same individuals in different releases can be easily used to violate users' privacy by adversaries observing multiple data releases, even if state-of-the-art privacy protection techniques are applied. We show how the above sequential background knowledge can be actually obtained by an adversary, and used to identify with high confidence the sensitive values of an individual. Our proposed defense algorithm is based on Jensen-Shannon divergence; experiments show its superiority with respect to other applicable solutions. To the best of our knowledge, this is the first work that systematically investigates the role of sequential background knowledge in serial release of transaction data.

[1]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[2]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[3]  Kian-Lee Tan,et al.  CASTLE: Continuously Anonymizing Data Streams , 2011, IEEE Transactions on Dependable and Secure Computing.

[4]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[5]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[6]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[7]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[8]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[10]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Raimondo Manca,et al.  A Stochastic Model for the HIV/AIDS Dynamic Evolution , 2007 .

[12]  Raymond Chi-Wing Wong,et al.  Global privacy guarantee in serial data publishing , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[13]  Beng Chin Ooi,et al.  Anonymizing Streaming Data for Privacy Protection , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[16]  Raymond Chi-Wing Wong,et al.  Privacy preserving serial data publishing by role composition , 2008, Proc. VLDB Endow..

[17]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[18]  Ninghui Li,et al.  Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[19]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[20]  Jong-Ling Fuh,et al.  Measuring Alzheimer's disease progression with transition probabilities in the Taiwanese population , 2004, International journal of geriatric psychiatry.

[21]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  R F Woolson,et al.  The dynamics of disease progression in sepsis: Markov modeling describing the natural history and the likely impact of effective antisepsis agents. , 1998, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[23]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[24]  Elisa Bertino,et al.  Secure Anonymization for Incremental Datasets , 2006, Secure Data Management.

[25]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[26]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.