Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking

Deep neural networks have become a primary tool for solving problems in many fields. They are also used for addressing information retrieval problems and show strong performance in several tasks. Training these models requires large, representative datasets and for most IR tasks, such data contains sensitive information from users. Privacy and confidentiality concerns prevent many data owners from sharing the data, thus today the research community can only benefit from research on large-scale datasets in a limited manner. In this paper, we discuss privacy preserving mimic learning, i.e., using predictions from a privacy preserving trained model instead of labels from the original sensitive training data as a supervision signal. We present the results of preliminary experiments in which we apply the idea of mimic learning and privacy preserving mimic learning for the task of document re-ranking as one of the core IR tasks. This research is a step toward laying the ground for enabling researchers from data-rich environments to share knowledge learned from actual users' data, which should facilitate research collaborations.

[1]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[2]  Dejing Dou,et al.  Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction , 2016, AAAI.

[3]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[4]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[5]  Yuan Tang,et al.  TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning , 2016, ArXiv.

[6]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[7]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[8]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  M. de Rijke,et al.  Getting Started with Neural Models for Semantic Matching in Web Search , 2016, ArXiv.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[12]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[13]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, Allerton.

[14]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[15]  Bhaskar Mitra,et al.  Neural Text Embeddings for Information Retrieval , 2017, WSDM.

[16]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[17]  Claudio Carpineto,et al.  Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph , 2013, ECIR.