Privacy Inference on Knowledge Graphs: Hardness and Approximation

The rapid information propagation facilitates our work and life without precedent in history, but it has tremendously exaggerated the risk and consequences of privacy invasion. Today's attackers are becoming more and more powerful in gathering personal information from many sources and mining these data to further uncover users' privacy. A great number of previous works have shown that, with adequate background knowledge, attackers are even able to infer sensitive information that is not revealed to anyone malicious before. In this paper, we model the attacker's knowledge using a knowledge graph and formally define the privacy inference problem. We show its #P-hardness and design an approximation algorithm to perform privacy inference in an iterative fashion, which also reflects real-life network evolution. The simulations on two data sets demonstrate the feasibility and efficacy of privacy inference using knowledge graphs.

[1]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[2]  Heiko Paulheim,et al.  A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data , 2014, LD4KD.

[3]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[4]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[5]  Carl A. Gunter,et al.  Privacy and Security in the Genomic Era , 2014, CCS.

[6]  Gökhan Tür,et al.  Using a knowledge graph and query click logs for unsupervised learning of relation detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Shouling Ji,et al.  Structural Data De-anonymization: Quantification, Practice, and Implications , 2014, CCS.

[8]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[9]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[10]  Catherine Dwyer Privacy in the Age of Google and Facebook , 2011, IEEE Technology and Society Magazine.

[11]  Johannes Fürnkranz,et al.  Unsupervised generation of data mining features from linked open data , 2012, WIMS '12.

[12]  Christopher Ré,et al.  Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[13]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[14]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Xiang-Yang Li,et al.  De-anonymizing social networks and inferring private attributes using knowledge graphs , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[16]  Eytan Adar,et al.  User 4XXXXX9: Anonymizing Query Logs , 2007 .

[17]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[18]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Ninghui Li,et al.  Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Henry Holtzman,et al.  Online Friend Recommendation through Personality Matching and Collaborative Filtering , 2011 .

[21]  Bhavani M. Thuraisingham,et al.  Preventing Private Information Inference Attacks on Social Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[23]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[24]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[25]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[26]  Felix Naumann,et al.  Extracting structured information from Wikipedia articles to populate infoboxes , 2010, CIKM '10.

[27]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[28]  J. Scott Provan,et al.  The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected , 1983, SIAM J. Comput..

[29]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[30]  Chao Li,et al.  Internet Security Protection in Personal Sensitive Information , 2014, 2014 Tenth International Conference on Computational Intelligence and Security.

[31]  Christoph Lutz,et al.  Re-Setting the Stage for Privacy : A Multi-Layered Privacy Interaction Framework and Its Application , 2015 .

[32]  Charles J. Colbourn,et al.  The Combinatorics of Network Reliability , 1987 .

[33]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[34]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.