Privacy Risk in Anonymized Heterogeneous Information Networks

Anonymized user datasets are often released for research or industry applications. As an example, t.qq.com released its anonymized users’ profile, social interaction, and recommendation log data in KDD Cup 2012 to call for recommendation algorithms. Since the entities (users and so on) and edges (links among entities) are of multiple types, the released social network is a heterogeneous information network. Prior work has shown how privacy can be compromised in homogeneous information networks by the use of specific types of graph patterns. We show how the extra information derived from heterogeneity can be used to relax these assumptions. To characterize and demonstrate this added threat, we formally define privacy risk in an anonymized heterogeneous information network to identify the vulnerability in the possible way such data are released, and further present a new de-anonymization attack that exploits the vulnerability. Our attack successfully de-anonymized most individuals involved in the data—for an anonymized 1,000user t.qq.com network of density 0.01, the attack precision is over 90% with a 2.3-million-user auxiliary network.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[3]  John Scott What is social network analysis , 2010 .

[4]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[5]  Yanghua Xiao,et al.  k-symmetry model for identity anonymization in social networks , 2010, EDBT '10.

[6]  Jian Pei,et al.  Preserving Privacy in Social Networks Against Neighborhood Attacks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[8]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[10]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[11]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  Jon M. Kleinberg,et al.  Wherefore art thou R3579X? , 2011, Commun. ACM.

[13]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[14]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Jia Liu,et al.  K-isomorphism: privacy preserving network publication against structural attacks , 2010, SIGMOD Conference.

[16]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[17]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[18]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[19]  Philip S. Yu,et al.  Mining Knowledge from Data: An Information Network Analysis Approach , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[20]  Lei Zou,et al.  K-Automorphism: A General Framework For Privacy Preserving Network Publication , 2009, Proc. VLDB Endow..

[21]  Letizia Tanca,et al.  Proceedings of the 19th International Conference on Extending Database Technology, EDBT , 2016, EDBT 2016.

[22]  K. Liu,et al.  Towards identity anonymization on graphs , 2008, SIGMOD Conference.

[23]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[24]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.