A Concentration of Measure Approach to Database De-anonymization

In this paper, matching of correlated high-dimensional databases is investigated. A stochastic database model is considered where the correlation among the database entries is governed by an arbitrary joint distribution. Concentration of measure theorems such as typicality and laws of large numbers are used to develop a database matching scheme and derive necessary conditions for successful matching. Furthermore, it is shown that these conditions are tight through a converse result which characterizes a set of distributions on the database entries for which reliable matching is not possible. The necessary and sufficient conditions for reliable matching are evaluated in the cases when the database entries are independent and identically distributed as well as under Markovian database models.

[1]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[2]  Prateek Mittal,et al.  Fundamental Limits of Database Alignment , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[3]  Anupam Datta,et al.  Provable De-anonymization of Large Datasets with Sparse Dimensions , 2012, POST.

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  A. Barron THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[6]  Hossein Pishro-Nik,et al.  Matching Anonymized and Obfuscated Time Series to Users’ Profiles , 2017, IEEE Transactions on Information Theory.

[7]  Ying Li,et al.  Section classification in clinical notes using supervised hidden markov model , 2010, IHI.

[8]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[9]  Elza Erkip,et al.  Typicality Matching for Pairs of Correlated Graphs , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[10]  Matthias Grossglauser,et al.  When can two unlabeled networks be aligned under partial overlap? , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  L. Breiman The Individual Ergodic Theorem of Information Theory , 1957 .

[12]  Elza Erkip,et al.  Seeded graph matching: Efficient algorithms and theoretical guarantees , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[13]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[14]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[15]  Elza Erkip,et al.  An information theoretic framework for active de-anonymization in social networks based on group memberships , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).