Stochastic Privacy Extended Version

Online services such as web search and e-commerce applications typically rely on the collection of data about users, including details of their activities on the web. Such personal data is used to maximize revenues via targeting of advertisements and longer engagements of users, and to enhance the quality of service via personalization of content. To date, service providers have largely followed the approach of either requiring or requesting consent for collecting user data. Users may be willing to share private information in return for incentives, enhanced services, or assurances about the nature and extent of the logged data. We introduce stochastic privacy, an approach to privacy centering on the simple concept of providing people with a guarantee that the probability that their personal data will be shared does not exceed a given bound. Such a probability, which we refer to as the privacy risk, can be given by users as a preference or communicated as a policy by a service provider. Service providers can work to personalize and to optimize revenues in accordance with preferences about privacy risk. We present procedures, proofs, and an overall system for maximizing the quality of services, while respecting bounds on privacy risk. We demonstrate the methodology with a case study and evaluation of the procedures applied to web search personalization. We show how we can achieve near-optimal utility of accessing information with provable guarantees on the probability of sharing data.

[1]  Jonathan Grudin,et al.  A study of preferences for sharing and privacy , 2005, CHI Extended Abstracts.

[2]  Andreas Krause,et al.  A Utility-Theoretic Approach to Privacy and Personalization , 2008, AAAI.

[3]  Susan T. Dumais,et al.  Classification-enhanced ranking , 2010, WWW '10.

[4]  Carlos Guestrin,et al.  A Note on the Budgeted Maximization of Submodular Functions , 2005 .

[5]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[6]  Ryen W. White,et al.  Characterizing the influence of domain expertise on web search behavior , 2009, WSDM '09.

[7]  Ryen W. White,et al.  Sampling high-quality clicks from noisy click data , 2010, WWW '10.

[8]  Andreas Krause,et al.  Distributed Submodular Maximization: Identifying Representative Elements in Massive Data , 2013, NIPS.

[9]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[10]  Alissa Cooper,et al.  A survey of query log privacy-enhancing techniques from a policy perspective , 2008, TWEB.

[11]  Andreas Krause,et al.  Near-optimal Observation Selection using Submodular Functions , 2007, AAAI.

[12]  Filip Radlinski,et al.  Inferring and using location metadata to personalize web search , 2011, SIGIR.

[13]  Ke Wang,et al.  Privacy-enhancing personalized web search , 2007, WWW '07.

[14]  Andreas Krause,et al.  A Utility-Theoretic Approach to Privacy in Online Services , 2010, J. Artif. Intell. Res..

[15]  Eytan Adar,et al.  User 4XXXXX9: Anonymizing Query Logs , 2007 .

[16]  Ryen W. White,et al.  Personalized models of search satisfaction , 2013, CIKM.

[17]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .