Data Leakage Detection

We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody's laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party.

[1]  Radu Sion,et al.  Rights protection for relational data , 2003, IEEE Transactions on Knowledge and Data Engineering.

[2]  V. N. Murty Counting the Integer Solutions of a Linear Equation with Unit Coefficients , 1981 .

[3]  Richard Fromm,et al.  Digital Music Distribution and Audio Watermarking , 2007 .

[4]  Rajeev Motwani,et al.  Towards robustness in query auditing , 2006, VLDB.

[5]  Frank Boland,et al.  Watermarking digital images for copyright protection , 1995 .

[6]  Peter Buneman,et al.  Provenance in databases , 2009, SIGMOD '07.

[7]  Sabrina De Capitani di Vimercati,et al.  An algebra for composing access control policies , 2002, TSEC.

[8]  M. Atallah,et al.  Watermarking Relational Databases , 2002 .

[9]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[10]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[11]  Sushil Jajodia,et al.  Fingerprinting relational databases: schemes and specialties , 2005, IEEE Transactions on Dependable and Secure Computing.

[12]  Sushil Jajodia,et al.  Flexible support for multiple access control policies , 2001, TODS.

[13]  Panos M. Pardalos,et al.  Quadratic programming with one negative eigenvalue is NP-hard , 1991, J. Glob. Optim..

[14]  Sachin B. Patkar,et al.  Approximation Algorithms for Min-k-Overlap Problems Using the Principal Lattice of Partitions Approach , 1994, J. Algorithms.

[15]  Hector Garcia-Molina,et al.  Privacy, Preservation and Performance: The 3 P's of Distributed Data Management , 2008, 2008 11th IEEE High Assurance Systems Engineering Symposium.

[16]  Jianmin Wang,et al.  An Improved Algorithm to Watermark Numeric Relational Data , 2005, WISA.

[17]  N. K Patil,et al.  Data Leakage Detection , 2013 .

[18]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Bernd Girod,et al.  Watermarking of uncompressed and compressed video , 1998, Signal Process..

[20]  W. J. Dowling,et al.  Watermarking digital images for copyright protection , 1996 .