Collaborative, Privacy-Preserving Data Aggregation at Scale

Combining and analyzing data collected at multiple administrative locations is critical for a wide variety of applications, such as detecting malicious attacks or computing an accurate estimate of the popularity of Web sites. However, legitimate concerns about privacy often inhibit participation in collaborative data aggregation. In this paper, we design, implement, and evaluate a practical solution for privacy-preserving data aggregation (PDA) among a large number of participants. Scalability and efficiency is achieved through a "semi-centralized" architecture that divides responsibility between a proxy that obliviously blinds the client inputs and a database that aggregates values by (blinded) keywords and identifies those keywords whose values satisfy some evaluation function. Our solution leverages a novel cryptographic protocol that provably protects the privacy of both the participants and the keywords, provided that proxy and database do not collude, even if both parties may be individually malicious. Our prototype implementation can handle over a million suspect IP addresses per hour when deployed across only two quad-core servers, and its throughput scales linearly with additional computational resources.

[1]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[2]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[3]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[4]  Moti Yung,et al.  Efficient robust private set intersection , 2009, Int. J. Appl. Cryptogr..

[5]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[6]  Vivek S. Pai,et al.  ConfiDNS: Leveraging Scale and History to Improve DNS Security , 2006, WORLDS.

[7]  David Mazières,et al.  RE: Reliable Email , 2006, NSDI.

[8]  Matthew C. Caesar,et al.  Evaluating the Potential of Collaborative Anomaly Detection , 2008 .

[9]  Silvio Micali,et al.  The knowledge complexity of interactive proof-systems , 1985, STOC '85.

[10]  NaorMoni,et al.  Number-theoretic constructions of efficient pseudo-random functions , 2004 .

[11]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[12]  Ivan Damgård,et al.  On the Amortized Complexity of Zero-Knowledge Protocols , 2009, CRYPTO.

[13]  Robert Tappan Morris,et al.  DNS performance and the effectiveness of caching , 2001, IMW '01.

[14]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[15]  Moni Naor,et al.  Oblivious Transfer with Adaptive Queries , 1999, CRYPTO.

[16]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[17]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[18]  Benny Pinkas,et al.  FairplayMP: a system for secure multi-party computation , 2008, CCS.

[19]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[20]  Peter Winkler,et al.  Comparing information without leaking it , 1996, CACM.

[21]  Moni Naor,et al.  Distributed Oblivious Transfer , 2000, ASIACRYPT.

[22]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[23]  Michael K. Reiter,et al.  Protecting Privacy in Key-Value Search Systems , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[24]  T. Schwartz,et al.  Clearinghouse , 1971 .

[25]  Ivan Damgård,et al.  Secure Multiparty Computation Goes Live , 2009, Financial Cryptography.

[26]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[27]  Yuval Ishai,et al.  Extending Oblivious Transfers Efficiently , 2003, CRYPTO.

[28]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[29]  Yehuda Lindell,et al.  Efficient Protocols for Set Intersection and Pattern Matching with Security Against Malicious and Covert Adversaries , 2008, Journal of Cryptology.

[30]  Adrian Perrig,et al.  Perspectives: Improving SSH-style Host Authentication with Multi-Path Probing , 2008, USENIX Annual Technical Conference.

[31]  Yehuda Lindell,et al.  Efficient Protocols for Set Intersection and Pattern Matching with Security Against Malicious and Covert Adversaries , 2008, TCC.

[32]  Scott Shenker,et al.  Fighting Coordinated Attackers with Cross-Organizational Information Sharing , 2006, HotNets.

[33]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[34]  Fernando Silveira,et al.  Detectability of Traffic Anomalies in Two Adjacent Networks , 2007, PAM.

[35]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[36]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[37]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[38]  David Chaum,et al.  Multiparty unconditionally secure protocols , 1988, STOC '88.

[39]  David Chaum,et al.  Multiparty Unconditionally Secure Protocols (Abstract) , 1987, CRYPTO.

[40]  Claus-Peter Schnorr,et al.  Efficient signature generation by smart cards , 2004, Journal of Cryptology.

[41]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[42]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[43]  Stuart E. Schechter,et al.  Inoculating SSH Against Address Harvesting , 2006, NDSS.

[44]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[45]  Silvio Micali,et al.  The Knowledge Complexity of Interactive Proof Systems , 1989, SIAM J. Comput..

[46]  Michael K. Reiter,et al.  Fair Exchange with a Semi-Trusted Third Party (extended abstract) , 1997, CCS.

[47]  Moni Naor,et al.  Number-theoretic constructions of efficient pseudo-random functions , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[48]  Andreas Terzis,et al.  My Botnet Is Bigger Than Yours (Maybe, Better Than Yours): Why Size Estimates Remain Challenging , 2007, HotBots.

[49]  Ivan Damgård,et al.  Multiparty Computation Goes Live , 2008, IACR Cryptol. ePrint Arch..

[50]  Vyas Sekar,et al.  Analyzing large DDoS attacks using multiple data sources , 2006, LSAD '06.

[51]  Benny Pinkas,et al.  Keyword Search and Oblivious Pseudorandom Functions , 2005, TCC.