A Utility-Theoretic Approach to Privacy in Online Services

Online offerings such as web search, news portals, and e-commerce applications face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highlighted the potential to improve performance by introducing methods to personalize services based on special knowledge about users and their context. For example, a user's demographics, location, and past search and browsing may be useful in enhancing the results offered in response to web search queries. However, reasonable concerns about privacy by both users, providers, and government agencies acting on behalf of citizens, may limit access by services to such information. We introduce and explore an economics of privacy in personalization, where people can opt to share personal information, in a standing or on-demand manner, in return for expected enhancements in the quality of an online service. We focus on the example of web search and formulate realistic objective functions for search efficacy and privacy. We demonstrate how we can find a provably near-optimal optimization of the utility-privacy tradeoff in an efficient manner. We evaluate our methodology on data drawn from a log of the search activity of volunteer participants. We separately assess users preferences about privacy and utility via a large-scale survey, aimed at eliciting preferences about peoples willingness to trade the sharing of personal data in returns for gains in search efficiency. We show that a significant level of personalization can be achieved using a relatively small amount of information about users.

[1]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[4]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[5]  G. Nemhauser,et al.  Maximizing Submodular Set Functions: Formulations and Analysis of Algorithms* , 1981 .

[6]  John R. Beaumont,et al.  Studies on Graphs and Discrete Programming , 1982 .

[7]  Ronald A. Howard,et al.  Readings on the Principles and Applications of Decision Analysis , 1989 .

[8]  S. Schwartz,et al.  An accelerated sequential algorithm for producing D -optimal designs , 1989 .

[9]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[10]  Eric Horvitz,et al.  An Approximate Nonmyopic Computation for Value of Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  L. van der Gaag,et al.  Selective evidence gathering for diagnostic belief networks , 1993 .

[13]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[14]  Finn Verner Jensen,et al.  Myopic Value of Information in Influence Diagrams , 1997, UAI.

[15]  G. A. Tijssen,et al.  The Data-Correcting Algorithm for the Minimization of Supermodular Functions , 1999 .

[16]  Kai Lung Hui,et al.  Online Information Privacy: Measuring the Cost-Benefit Trade-Off , 2002, ICIS.

[17]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Masatoshi Yoshikawa,et al.  Adaptive web search based on user profile constructed without any effort from users , 2004, WWW '04.

[19]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[20]  Jonathan Grudin,et al.  A study of preferences for sharing and privacy , 2005, CHI Extended Abstracts.

[21]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[22]  Avanidhar Subrahmanyam,et al.  The Value of Private Information , 2005 .

[23]  Rahul Telang,et al.  Examining the Personalization-Privacy Tradeoff – an Empirical Investigation with Email Advertisements , 2005 .

[24]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[25]  Eytan Adar,et al.  Valuating Privacy , 2005, WEIS.

[26]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[28]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[29]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[30]  Eric Horvitz Machine Learning, Reasoning, and Intelligence in Daily Life: Directions and Challenges , 2006 .

[31]  Vahab S. Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[32]  Doug Downey,et al.  Models of Searching and Browsing: Languages, Studies, and Application , 2007, IJCAI.

[33]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[34]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[35]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[36]  Ke Wang,et al.  Privacy-enhancing personalized web search , 2007, WWW '07.

[37]  Eytan Adar,et al.  User 4XXXXX9: Anonymizing Query Logs , 2007 .

[38]  Vahab Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2007, FOCS 2007.

[39]  Sharad Mehrotra,et al.  Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach , 2007, SDM.

[40]  Andreas Krause,et al.  A Utility-Theoretic Approach to Privacy and Personalization , 2008, AAAI.

[41]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[42]  Andreas Krause,et al.  Optimal Value of Information in Graphical Models , 2009, J. Artif. Intell. Res..

[43]  Elisa Bertino,et al.  Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk , 2009, Trans. Data Priv..

[44]  Andreas Krause,et al.  Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization , 2010, COLT 2010.