Privacy-preserving collaborative anomaly detection

Unwanted traffic is a major concern in the Internet today. Unwanted traffic includes Denial of Service attacks, worms, and spam. Identifying and mitigating unwanted traffic costs businesses many billions of USD every year. The process of identifying this traffic is called anomaly detection, and Intrusion Detection Systems (IDS’es) are among the most prevalent techniques. IDS’es, such as Snort, allow users to write “rules” that specify the properties of traffic that should be detected and the corrective action to be taken in response. Unfortunately, applying these rules in an online setting can be prohibitively expensive for large networks, such as Tier-1 ISPs, which may have tens of thousands of links and many Gbps of traffic. In the first chapter of this thesis we present a system that leverages machine learning algorithms to detect the same type of unwanted traffic as Snort, but on summarized data for faster processing. Our results demonstrate that this system can effectively learn to classify many Snort rules with a high degree of accuracy. Unfortunately, distinguishing good traffic from unwanted traffic is challenging even in an offline setting because many types of unwanted traffic traffic, such as network attacks, deliberately mimic the behavior of normal traffic. We therefore propose that the targets of unwanted traffic should collaborate by correlating their attack data, under the assumption that a given malicious host is likely to affect more than one victim over time. That is, the senders of unwanted traffic will use individual computers (i.e., malicious hosts) repeatedly for various nefarious purposes in order to maximize their profits, and this repeated use will leave traces across networks. In the second chapter of this thesis we present a measurement study that quantifies the potential gain from this collaborative anomaly detection. Specifically, using traces from operational networks, we calculate the fraction of detected network anomalies (viz., IP scans, port scans, and DoS attacks) that could have been mitigated if some subset of the victims collaborated by sharing information about past perpetrators. One major challenge with the proposed collaborative anomaly detection is that the human owner/operators of participating networks are often hesitant to openly share information about the hosts (customers) that use their services. In the third chapter of the thesis we address this problem by proposing and evaluating the efficiency of a novel cryptographic protocol that allows victims to collaborate in a manner that protects their privacy. Our protocol allows participants to submit a set of IP addresses that they suspect might be engaging in unwanted activity, and it returns the set of IP addresses that existed in some fraction of all suspect sets (i.e., threshold set-intersection). The protocol preserves privacy because it never reveals who suspected whom, and a submitted IP address is only revealed when more than n participating networks suspect it. Our implementation of said protocol is able to correlate millions of suspect IP addresses per hour when running on two quad-core machines.

[1]  Albert G. Greenberg,et al.  Network anomography , 2005, IMC '05.

[2]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[3]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[4]  Miroslav Dudík,et al.  Performance Guarantees for Regularized Maximum Entropy Density Estimation , 2004, COLT.

[5]  Virgílio A. F. Almeida,et al.  Characterizing a spam traffic , 2004, IMC '04.

[6]  David Moore,et al.  Internet quarantine: requirements for containing self-propagating code , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[7]  Kavé Salamatian,et al.  Combining filtering and statistical methods for anomaly detection , 2005, IMC '05.

[8]  NaorMoni,et al.  Number-theoretic constructions of efficient pseudo-random functions , 2004 .

[9]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[12]  Balachander Krishnamurthy,et al.  Rule-Based Anomaly Detection on IP Flows , 2009, IEEE INFOCOM 2009.

[13]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[14]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[15]  Michael K. Reiter,et al.  Protecting Privacy in Key-Value Search Systems , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[16]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[17]  Yuval Ishai,et al.  Extending Oblivious Transfers Efficiently , 2003, CRYPTO.

[18]  Stefan Savage,et al.  An inquiry into the nature and causes of the wealth of internet miscreants , 2007, CCS '07.

[19]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[20]  Fang Yu,et al.  How dynamic are IP addresses? , 2007, SIGCOMM '07.

[21]  Andreas Terzis,et al.  My Botnet Is Bigger Than Yours (Maybe, Better Than Yours): Why Size Estimates Remain Challenging , 2007, HotBots.

[22]  Ivan Damgård,et al.  Multiparty Computation Goes Live , 2008, IACR Cryptol. ePrint Arch..

[23]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[24]  Vyas Sekar,et al.  Analyzing large DDoS attacks using multiple data sources , 2006, LSAD '06.

[25]  Vern Paxson,et al.  A brief history of scanning , 2007, IMC '07.

[26]  Benny Pinkas,et al.  Keyword Search and Oblivious Pseudorandom Functions , 2005, TCC.

[27]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[28]  Benny Pinkas,et al.  FairplayMP: a system for secure multi-party computation , 2008, CCS.

[29]  John C. Mitchell,et al.  Client-Side Defense Against Web-Based Identity Theft , 2004, NDSS.

[30]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[31]  Vivek S. Pai,et al.  ConfiDNS: Leveraging Scale and History to Improve DNS Security , 2006, WORLDS.

[32]  Yinglian Xie,et al.  How dynamic are IP addresses , 2007, SIGCOMM 2007.

[33]  Moni Naor,et al.  Oblivious Transfer with Adaptive Queries , 1999, CRYPTO.

[34]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[35]  Guofei Gu,et al.  A Taxonomy of Botnet Structures , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[36]  Adrian Perrig,et al.  Perspectives: Improving SSH-style Host Authentication with Multi-Path Probing , 2008, USENIX Annual Technical Conference.

[37]  Yehuda Lindell,et al.  Efficient Protocols for Set Intersection and Pattern Matching with Security Against Malicious and Covert Adversaries , 2008, TCC.

[38]  Scott Shenker,et al.  Fighting Coordinated Attackers with Cross-Organizational Information Sharing , 2006, HotNets.

[39]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[40]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[41]  Markus G. Kuhn,et al.  Analysis of a denial of service attack on TCP , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[42]  Claus-Peter Schnorr,et al.  Efficient signature generation by smart cards , 2004, Journal of Cryptology.

[43]  Robert Tappan Morris,et al.  DNS performance and the effectiveness of caching , 2002, TNET.

[44]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[45]  David Mazières,et al.  RE: Reliable Email , 2006, NSDI.

[46]  Balachander Krishnamurthy,et al.  Collaborating against common enemies , 2005, IMC '05.

[47]  Peter Winkler,et al.  Comparing information without leaking it , 1996, CACM.

[48]  Aaron Hackworth,et al.  Botnets as a Vehicle for Online Crimes , 2006 .

[49]  Guofei Gu,et al.  A Taxonomy of Botnet Structures , 2007, ACSAC.

[50]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[51]  Balachander Krishnamurthy,et al.  A generic language for application-specific flow sampling , 2008, CCRV.

[52]  Taeshik Shon,et al.  A hybrid machine learning approach to network anomaly detection , 2007, Inf. Sci..

[53]  Yehuda Lindell,et al.  Efficient Protocols for Set Intersection and Pattern Matching with Security Against Malicious and Covert Adversaries , 2008, Journal of Cryptology.

[54]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[55]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[56]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[57]  Randy H. Katz,et al.  Analyzing Cooperative Containment of Fast Scanning Worms , 2005, SRUTI.

[58]  Boris N. Oreshkin,et al.  Machine learning approaches to network anomaly detection , 2007 .

[59]  Stuart E. Schechter,et al.  Inoculating SSH Against Address Harvesting , 2006, NDSS.

[60]  Divyakant Agrawal,et al.  Duplicate detection in click streams , 2005, WWW '05.

[61]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[62]  Robert Tappan Morris,et al.  DNS performance and the effectiveness of caching , 2001, IMW '01.

[63]  Nicolas Ianelli,et al.  Botnets as a Vehicle for Online Crime , 2007 .

[64]  Hal Berghel,et al.  Identity theft, social security numbers, and the Web , 2000, CACM.

[65]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[66]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[67]  Stefan Savage,et al.  Inside the Slammer Worm , 2003, IEEE Secur. Priv..

[68]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[69]  Zihui Ge,et al.  Lightweight application classification for network management , 2007, INM '07.

[70]  Fernando Silveira,et al.  Detectability of Traffic Anomalies in Two Adjacent Networks , 2007, PAM.

[71]  George Varghese,et al.  On Scalable Attack Detection in the Network , 2004, IEEE/ACM Transactions on Networking.

[72]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.