Differential Privacy for Information Retrieval

The concern for privacy is real for any research that uses user data. Information Retrieval (IR) is not an exception. Many IR algorithms and applications require the use of users' personal information, contextual information and other sensitive and private information. The extensive use of personalization in IR has become a double-edged sword. Sometimes, the concern becomes so overwhelming that IR research has to stop to avoid privacy leaks. The good news is that recently there have been increasing attentions paid on the joint field of privacy and IR -- privacy-preserving IR. As part of the effort, this tutorial offers an introduction to differential privacy (DP), one of the most advanced techniques in privacy research, and provides necessary set of theoretical knowledge for applying privacy techniques in IR. Differential privacy is a technique that provides strong privacy guarantees for data protection. Theoretically, it aims to maximize the data utility in statistical datasets while minimizing the risk of exposing individual data entries to any adversary. Differential privacy has been successfully applied to a wide range of applications in database (DB) and data mining (DM). The research in privacy-preserving IR is relatively new, however, research has shown that DP is also effective in supporting multiple IR tasks. This tutorial aims to lay a theoretical foundation of DP and explains how it can be applied to IR. It highlights the differences in IR tasks and DB and DM tasks and how DP connects to IR. We hope the attendees of this tutorial will have a good understanding of DP and other necessary knowledge to work on the newly minted joint research field of privacy and IR.

[1]  Min Wu,et al.  Security analysis for privacy preserving search of multimedia , 2010, 2010 IEEE International Conference on Image Processing.

[2]  Alissa Cooper,et al.  A survey of query log privacy-enhancing techniques from a policy perspective , 2008, TWEB.

[3]  Ming Li,et al.  Privacy-preserving inference of social relationships from location data: a vision paper , 2015, SIGSPATIAL/GIS.

[4]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[7]  Cyrus Shahabi,et al.  Differentially private publication of location entropy , 2016, SIGSPATIAL/GIS.

[8]  Grace Hui Yang,et al.  Increased Information Leakage from Text , 2014, PIR@SIGIR.

[9]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[11]  Chris Clifton,et al.  A Guide to Differential Privacy Theory in Social Network Analysis , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[12]  Iadh Ounis,et al.  Proceeding of the 1 st International Workshop on Privacy-Preserving IR : When Information Retrieval Meets Privacy and Security ( PIR 2014 ) , 2014 .

[13]  Xiaoqian Jiang,et al.  Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach , 2015, CIKM.

[14]  Xiang Cheng,et al.  Differentially private frequent sequence mining via sampling-based candidate pruning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[15]  Grace Hui Yang,et al.  Safelog: Supporting Web Search and Mining by Differentially-Private Query Logs , 2016, AAAI Fall Symposia.

[16]  Grace Hui Yang,et al.  Deriving Differentially Private Session Logs for Query Suggestion , 2017, ICTIR.

[17]  N. Cao,et al.  Privacy-preserving multi-keyword ranked search over encrypted cloud data , 2011, 2011 Proceedings IEEE INFOCOM.

[18]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[19]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[20]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track , 2014, TREC.

[21]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[22]  Grace Hui Yang,et al.  Privacy-Preserving IR 2016: Differential Privacy, Search, and Social Media , 2016, SIGIR.

[23]  Xiang Cheng,et al.  Differentially Private Frequent Itemset Mining via Transaction Splitting , 2015, IEEE Transactions on Knowledge and Data Engineering.

[24]  Cyrus Shahabi,et al.  Differentially Private H-Tree , 2015, GeoPrivacy@SIGSPATIAL.

[25]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[26]  Grace Hui Yang,et al.  Privacy Preserving IR 2015: A SIGIR 2015 Workshop , 2016, SIGF.

[27]  Grace Hui Yang,et al.  Anonymizing Query Logs by Differential Privacy , 2016, SIGIR.

[28]  Cyrus Shahabi,et al.  Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing , 2017, IEEE Transactions on Mobile Computing.