What's the Gist? Privacy-Preserving Aggregation of User Profiles

Over the past few years, online service providers have started gathering increasing amounts of personal information to build user profiles and monetize them with advertisers and data brokers. Users have little control of what information is processed and are often left with an all-or-nothing decision between receiving free services or refusing to be profiled. This paper explores an alternative approach where users only disclose an aggregate model --- the "gist" --- of their data. We aim to preserve data utility and simultaneously provide user privacy. We show that this approach can be efficiently supported by letting users contribute encrypted and differentially-private data to an aggregator. The aggregator combines encrypted contributions and can only extract an aggregate model of the underlying data. We evaluate our framework on a dataset of 100,000 U.S. users obtained from the U.S. Census Bureau and show that i it provides accurate aggregates with as little as 100 users, ii it can generate revenue for both users and data brokers, and iii its overhead is appreciably low.

[1]  Nick Cercone,et al.  Parallel Knowledge Discovery Using Domain Generalization Graphs , 1997, PKDD.

[2]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[3]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[4]  Ponnurangam Kumaraguru,et al.  Privacy Indexes: A Survey of Westin's Studies , 2005 .

[5]  Natasha Singer You for Sale: Mapping, and Sharing, the Consumer Genome , 2012 .

[6]  Bernardo A. Huberman,et al.  A Market for Unbiased Private Data: Paying Individuals According to Their Privacy Attitudes , 2012, First Monday.

[7]  Balachander Krishnamurthy,et al.  For sale : your data: by : you , 2011, HotNets-X.

[8]  Claude Castelluccia,et al.  Selling Off Privacy at Auction , 2014, NDSS 2014.

[9]  Paul Francis,et al.  SplitX: high-performance private analytics , 2013, SIGCOMM.

[10]  Zekeriya Erkin,et al.  Private Computation of Spatial and Temporal Power Consumption with Smart Meters , 2012, ACNS.

[11]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[12]  Vijay Erramilli,et al.  Your browsing behavior for a big mac: economics of personal information online , 2011, WWW.

[13]  Howard J. Hamilton,et al.  Ranking the Interestingness of Summaries from Data Mining Systems , 1999, FLAIRS.

[14]  Helger Lipmaa,et al.  On Diophantine Complexity and Statistical Zero-Knowledge Arguments , 2003, ASIACRYPT.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Manuel Blum,et al.  Non-interactive zero-knowledge and its applications , 1988, STOC '88.

[17]  Helen Nissenbaum,et al.  Adnostic: Privacy Preserving Targeted Advertising , 2010, NDSS.

[18]  Suman Nath,et al.  Prefetching mobile ads: can advertising systems afford it? , 2013, EuroSys '13.

[19]  Suman Nath,et al.  Privacy-aware regression modeling of participatory sensing data , 2010, SenSys '10.

[20]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[21]  Paul Francis,et al.  Non-tracking web analytics , 2012, CCS.

[22]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[23]  Rui Zhang,et al.  PriSense: Privacy-Preserving Data Aggregation in People-Centric Urban Sensing Systems , 2010, 2010 Proceedings IEEE INFOCOM.

[24]  Elaine Shi,et al.  Privacy-Preserving Stream Aggregation with Fault Tolerance , 2012, Financial Cryptography.

[25]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[26]  Elaine Shi,et al.  Privacy-Preserving Aggregation of Time-Series Data , 2011, NDSS.

[27]  Aniket Kate,et al.  ObliviAd: Provably Secure and Practical Online Behavioral Advertising , 2012, 2012 IEEE Symposium on Security and Privacy.

[28]  J. Pollard,et al.  Monte Carlo methods for index computation () , 1978 .

[29]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[30]  Xi Chen,et al.  Mutual privacy-preserving regression modeling in participatory sensing , 2013, 2013 Proceedings IEEE INFOCOM.

[31]  Fabrice Boudot,et al.  Efficient Proofs that a Committed Number Lies in an Interval , 2000, EUROCRYPT.

[32]  Saikat Guha,et al.  Privad: Practical Privacy in Online Advertising , 2011, NSDI.

[33]  M. Angela Sasse,et al.  "Fairly Truthful": The Impact of Perceived Effort, Fairness, Relevance, and Sensitivity on Personal Data Disclosure , 2013, TRUST.