Differential Privacy for Information Retrieval

Information Retrieval (IR) research has extensively utilized personalization to advance its state-of-the-art. In this process, many IR algorithms and applications require the use of users' personal information, contextual information and other sensitive and private information. However, while IR researchers are making progress, there is always a concern over violations to the users' privacy. Sometimes, the concern becomes so overwhelming that IR research has to stop to avoid leaking users' privacy. The good news is that there have been increasing attentions paid on the joint field of privacy and IR -- privacy-preserving IR. As part of the effort, this tutorial offers an introduction to differential privacy (DP), one of the most advanced techniques in privacy research, and provides necessary set of theoretical knowledge for applying privacy techniques in IR. Differential privacy is a technique that provides strong privacy guarantees for data protection. Theoretically, it aims to maximize the data utility in statistical datasets while minimizing the risk of exposing individual data entries to any adversary. Differential privacy has been applied across a wide range of applications in database, data mining, and IR. This tutorial aims to lay a theoretical foundation of DP and how it can be applied to IR. We hope the attendees of this tutorial will have a good understanding of DP and the necessary knowledge to work on this newly minted joint research field of privacy and IR.

[1]  Cyrus Shahabi,et al.  Differentially Private H-Tree , 2015, GeoPrivacy@SIGSPATIAL.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Grace Hui Yang,et al.  Privacy Preserving IR 2015: A SIGIR 2015 Workshop , 2016, SIGF.

[4]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[6]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[7]  Alissa Cooper,et al.  A survey of query log privacy-enhancing techniques from a policy perspective , 2008, TWEB.

[8]  Ming Li,et al.  Privacy-preserving inference of social relationships from location data: a vision paper , 2015, SIGSPATIAL/GIS.

[9]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[11]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[12]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[13]  Xiaoqian Jiang,et al.  Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach , 2015, CIKM.

[14]  Grace Hui Yang,et al.  Increased Information Leakage from Text , 2014, PIR@SIGIR.

[15]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[16]  Cyrus Shahabi,et al.  Differentially private publication of location entropy , 2016, SIGSPATIAL/GIS.

[17]  Chris Clifton,et al.  A Guide to Differential Privacy Theory in Social Network Analysis , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[18]  Cyrus Shahabi,et al.  Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing , 2017, IEEE Transactions on Mobile Computing.

[19]  Iadh Ounis,et al.  Proceeding of the 1 st International Workshop on Privacy-Preserving IR : When Information Retrieval Meets Privacy and Security ( PIR 2014 ) , 2014 .

[20]  Grace Hui Yang,et al.  Anonymizing Query Logs by Differential Privacy , 2016, SIGIR.

[21]  Grace Hui Yang,et al.  Privacy-Preserving IR 2016: Differential Privacy, Search, and Social Media , 2016, SIGIR.

[22]  Xiang Cheng,et al.  Differentially Private Frequent Itemset Mining via Transaction Splitting , 2015, IEEE Transactions on Knowledge and Data Engineering.

[23]  Ting Yu,et al.  Mining frequent graph patterns with differential privacy , 2013, KDD.

[24]  Chunxiao Jiang,et al.  Information Security in Big Data: Privacy and Data Mining , 2014, IEEE Access.

[25]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track , 2014, TREC.

[27]  Xiang Cheng,et al.  Differentially private frequent sequence mining via sampling-based candidate pruning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[28]  Grace Hui Yang,et al.  Safelog: Supporting Web Search and Mining by Differentially-Private Query Logs , 2016, AAAI Fall Symposia.