Privacy, Personalization, and the Web: A Utility-Theoretic Approach

Online offerings such as web search face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highighted the potential to improve performance by introducing methods to personalize services based on special knowledge about users. For example, a user’s location, demographics, and past search and browsing may be useful in enhancing the efficiency and accuracy of web search. However, reasonable concerns about privacy by both users and providers limit access by services to such information. We explore the rich space of possibility where people can opt to share, in a standing or a real-time manner, personal information in return for expected enhancements in the quality of an online service. We present methods and studies on addressing such tradeoffs between privacy and utility in online services. We introduce concrete and realistic objective functions for efficacy and privacy and demonstrate how we can efficiently find a provably near-optimal optimization of the utility-privacy tradeoff. We evaluate our methodology on data drawn from a large-scale web search log of people who volunteered to have their logs explored so as to contribute to enhancing search performance. In order to incorporate personal preferences about privacy and utility, and the willingness to trade off revealing some quantity of personal data to a search system in returns for gains in efficiency, we performed a user study with 1400 participants. Employing utility and preferences estimated from the real-world data, we show that a significant level of personalization can be achieved using only a small amount of information about users.

[1]  Vahab Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2007, FOCS 2007.

[2]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[3]  G. A. Tijssen,et al.  The Data-Correcting Algorithm for the Minimization of Supermodular Functions , 1999 .

[4]  Ke Wang,et al.  Privacy-enhancing personalized web search , 2007, WWW '07.

[5]  Sharad Mehrotra,et al.  Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach , 2007, SDM.

[6]  Kai Lung Hui,et al.  Online Information Privacy: Measuring the Cost-Benefit Trade-Off , 2002, ICIS.

[7]  G. Nemhauser,et al.  Maximizing Submodular Set Functions: Formulations and Analysis of Algorithms* , 1981 .

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  John R. Beaumont,et al.  Studies on Graphs and Discrete Programming , 1982 .

[10]  Doug Downey,et al.  Models of Searching and Browsing: Languages, Studies, and Application , 2007, IJCAI.

[11]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[12]  Masatoshi Yoshikawa,et al.  Adaptive web search based on user profile constructed without any effort from users , 2004, WWW '04.

[13]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[14]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Jonathan Grudin,et al.  A study of preferences for sharing and privacy , 2005, CHI Extended Abstracts.

[16]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[20]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[21]  Eytan Adar,et al.  Valuating Privacy , 2005, WEIS.

[22]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[23]  Rahul Telang,et al.  Examining the Personalization-Privacy Tradeoff – an Empirical Investigation with Email Advertisements , 2005 .

[24]  Eytan Adar,et al.  User 4XXXXX9: Anonymizing Query Logs , 2007 .

[25]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).