Privacy Preserving Semi-supervised Learning for Labeled Graphs

We propose a novel privacy preserving learning algorithm that achieves semi-supervised learning in graphs. In real world networks, such as disease infection over individuals, links (contact) and labels (infection) are often highly sensitive information. Although traditional semisupervised learning methods play an important role in network data analysis, they fail to protect such sensitive information. Our solutions enable to predict labels of partially labeled graphs without disclosure of labels and links, by incorporating cryptographic techniques into the label propagation algorithm. Even when labels included in the graph are kept private, the accuracy of our PPLP is equivalent to that of label propagation which is allowed to observe all labels in the graph. Empirical analysis showed that our solution is scalable compared with existing privacy preserving methods. The results with human contact networks showed that our protocol takes only about 10 seconds for computation and no sensitive information is disclosed through the protocol execution.

[1]  Shigenobu Kobayashi,et al.  Privacy-preserving reinforcement learning , 2008, ICML '08.

[2]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[3]  P. Bearman,et al.  Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks1 , 2004, American Journal of Sociology.

[4]  Matthew Kam,et al.  Privacy Preserving Link Analysis on Dynamic Weighted Graph , 2005, Comput. Math. Organ. Theory.

[5]  Paul F. Syverson,et al.  Onion routing , 1999, CACM.

[6]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[7]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[8]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System (Awarded Best Student Paper!) , 2004 .

[9]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[10]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[11]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[12]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[13]  Ivan Damgård,et al.  A Generalisation, a Simplification and Some Applications of Paillier's Probabilistic Public-Key System , 2001, Public Key Cryptography.

[14]  Shigenobu Kobayashi,et al.  Link analysis for private weighted graphs , 2009, SIGIR.

[15]  Ivan Damgård,et al.  Practical Threshold RSA Signatures without a Trusted Dealer , 2000, EUROCRYPT.

[16]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.