Correlating Flow-based Network Measurements for Service Monitoring and Network Troubleshooting

The resilience of network services is continuously challenged by component failures, mis-configured devices, natural disasters and malicious users. Therefore, it is an important but unfortunately difficult task of network operators and service administrators to carefully manage their infrastructure in order to ensure high availability. In this thesis we contribute novel service monitoring and troubleshooting applications based on flow-based network measurements to help operators to address this challenge. Flow-level measurement data such as IPFIX or NetFlow typically provides statistical summaries about connections crossing a network including the number of exchanged bytes and packets. Flow-level data can be collected by off-the-shelf hardware used in backbone networks. It allows Internet Service Providers (ISPs) to monitor large-scale networks with a limited number of sensors. However, the range of security or network management related questions that can be answered directly by using flow-based data is strongly limited by the fact that only a small amount of information is collected per connection. In this work, we overcome this problem by correlating and analyzing sets of flows across different dimensions such as time, address space, or user groups. This hidden information proves very beneficial for flow-based troubleshooting applications. Using such an approach, we show how flow-based data can be instrumented to effectively support mail administrators in fighting spam. In more detail, we demonstrate that certain spam filtering decisions performed by mail servers can be accurately tracked at the ISP-level using flow-level data. Then, we argue that such aggregated knowledge from multiple e-mail domains does not only allow ISPs to remotely monitor what their “own” servers are doing, but also to develop and evaluate new scalable methods for fighting spam. To assist network operators with troubleshooting connectivity problems,

[1]  P. Gamble,et al.  Knowledge Management: A State-of-the-Art Guide , 2002 .

[2]  Richard Lippmann,et al.  Experience Using Active and Passive Mapping for Network Situational Awareness , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[3]  Taesang Choi,et al.  Content-aware Internet application traffic measurement and analysis , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[4]  Yin Zhang,et al.  Rapid detection of maintenance induced changes in service performance , 2011, CoNEXT '11.

[5]  Didier Sornette,et al.  Accurate network anomaly classification with generalized entropy metrics , 2011, Comput. Networks.

[6]  Didier Sornette,et al.  Beyond Shannon: Characterizing Internet Traffic with Generalized Entropy Metrics , 2009, PAM.

[7]  Christian Callegari,et al.  Identifying Skype Traffic in a Large-Scale Flow Data Repository , 2011, TMA.

[8]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[9]  Jia Wang,et al.  Finding a needle in a haystack: pinpointing significant BGP routing changes in an IP network , 2005, NSDI.

[10]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[11]  Boris Nechaev,et al.  Netalyzr: illuminating the edge network , 2010, IMC '10.

[12]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[13]  Sachin Agarwal,et al.  The New Web: Characterizing AJAX Traffic , 2008, PAM.

[14]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[15]  Dmitri Loguinov,et al.  Demystifying service discovery: implementing an internet-wide scanner , 2010, IMC '10.

[16]  Arun Venkataramani,et al.  iPlane: an information plane for distributed services , 2006, OSDI '06.

[17]  Wolfgang Mühlbauer,et al.  Digging into HTTPS: flow-based classification of webmail traffic , 2010, IMC '10.

[18]  Jennifer E. Rowley,et al.  The wisdom hierarchy: representations of the DIKW hierarchy , 2007, J. Inf. Sci..

[19]  Jürgen Quittek,et al.  Architecture for IP Flow Information Export , 2009, RFC.

[20]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[21]  Xenofontas A. Dimitropoulos,et al.  SEPIA: Privacy-Preserving Aggregation of Multi-Domain Network Events and Statistics , 2010, USENIX Security Symposium.

[22]  Anja Feldmann,et al.  Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection , 2006, USENIX Security Symposium.

[23]  Yin Zhang,et al.  Detecting the performance impact of upgrades in large operational networks , 2010, SIGCOMM '10.

[24]  John S. Heidemann,et al.  Understanding passive and active service discovery , 2007, IMC '07.

[25]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[26]  Birger Hjørland,et al.  Organizing Knowledge. An Introduction to Managing Access to Information , 2009, J. Documentation.

[27]  Kensuke Fukuda,et al.  Seven Years and One Day: Sketching the Evolution of Internet Traffic , 2009, IEEE INFOCOM 2009.

[28]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[29]  Guillaume Urvoy-Keller,et al.  Challenging statistical classification for operational usage: the ADSL case , 2009, IMC '09.

[30]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[31]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[32]  Russ Olsen Design Patterns in Ruby , 2007 .

[33]  Ming Zhang,et al.  PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services , 2004, OSDI.

[34]  Richard Clayton,et al.  Using Early Results from the 'spamHINTS' Project to Estimate an ISP Abuse Team's Task , 2006, CEAS.

[35]  Guillaume Urvoy-Keller,et al.  Hybrid Traffic Identification , 2010 .

[36]  Virgílio A. F. Almeida,et al.  Comparative Graph Theoretical Characterization of Networks of Spam , 2005, CEAS.

[37]  Kim-Kwang Raymond Choo,et al.  Google Drive: Forensic analysis of data remnants , 2014, J. Netw. Comput. Appl..

[38]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[39]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[40]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[41]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[42]  Olaf Maennel,et al.  Internet optometry: assessing the broken glasses in internet reachability , 2009, IMC '09.

[43]  Virgílio A. F. Almeida,et al.  Characterizing a spam traffic , 2004, IMC '04.

[44]  George Varghese,et al.  Network monitoring using traffic dispersion graphs (tdgs) , 2007, IMC '07.

[45]  Jon Crowcroft,et al.  Delivery Properties of Human Social Networks , 2009, IEEE INFOCOM 2009.

[46]  Ming Zhang,et al.  Effective Diagnosis of Routing Disruptions from End Systems , 2008, NSDI.

[47]  Michalis Faloutsos,et al.  Profiling the End Host , 2007, PAM.

[48]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[49]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[50]  Xenofontas A. Dimitropoulos,et al.  Classifying internet one-way traffic , 2012, Internet Measurement Conference.

[51]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[52]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[53]  Wei Li,et al.  Classifying HTTP Traffic in the New Age , 2008, SIGCOMM 2008.

[54]  David H. Reiley,et al.  The Economics of Spam , 2012 .

[55]  Karen A. Scarfone,et al.  Guide to Intrusion Detection and Prevention Systems (IDPS) , 2007 .

[56]  Vern Paxson,et al.  A brief history of scanning , 2007, IMC '07.

[57]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[58]  Xiaohong Guan,et al.  Accurate Classification of the Internet Traffic Based on the SVM Method , 2007, 2007 IEEE International Conference on Communications.

[59]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[60]  Anja Feldmann,et al.  Locating internet routing instabilities , 2004, SIGCOMM '04.

[61]  Thrasyvoulos Spyropoulos,et al.  Flow-level Characteristics of Spam and Ham , 2008, ArXiv.

[62]  J. Srivastava,et al.  Analyzing Network Traffic to Detect E-Mail Spamming Machines , 2004 .

[63]  Michalis Faloutsos,et al.  Comparison of Internet Traffic Classification Tools , 2007 .

[64]  S. Leinen CATI Charging and Accounting Technology for the Internet SNF SPP Projects 5003-054559 / 1 and 5003-054560 / 1 Fluxoscope a System for Flow-based Accounting , .

[65]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[66]  John S. Heidemann,et al.  On the characteristics and reasons of long-lived internet flows , 2010, IMC '10.

[67]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[68]  Robert Beverly,et al.  Exploiting Transport-Level Characteristics of Spam , 2008, CEAS.

[69]  Wolfgang Mühlbauer,et al.  FACT: Flow-Based Approach for Connectivity Tracking , 2011, PAM.

[70]  Farnam Jahanian,et al.  Internet inter-domain traffic , 2010, SIGCOMM '10.

[71]  Luca Deri,et al.  Effective traffic measurement using ntop , 2000 .

[72]  Thrasyvoulos Spyropoulos,et al.  Inferring Spammers in the Network Core , 2009, PAM.

[73]  James Won-Ki Hong,et al.  A Hybrid Approach for Accurate Application Traffic Identification , 2006, 2006 4th IEEE/IFIP Workshop on End-to-End Monitoring Techniques and Services.

[74]  Nick Feamster,et al.  Fast monitoring of traffic subpopulations , 2008, IMC '08.

[75]  Nick Feamster,et al.  Measuring the effects of internet path faults on reactive routing , 2003, SIGMETRICS '03.

[76]  Sebastian Scholz,et al.  Processing of Flow Accounting Data in Java: Framework Design and Performance Evaluation , 2010, EUNICE.

[77]  Stefano Giordano,et al.  On Multi-gigabit Packet Capturing with Multi-core Commodity Hardware , 2012, PAM.

[78]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[79]  Nick Feamster,et al.  Can DNS-Based Blacklists Keep Up with Bots? , 2006, CEAS.

[80]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[81]  David Wetherall,et al.  Studying Black Holes in the Internet with Hubble , 2008, NSDI.

[82]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[83]  Christian Borgelt,et al.  Induction of Association Rules: Apriori Implementation , 2002, COMPSTAT.

[84]  Brian Trammell,et al.  Peeling Away Timing Error in NetFlow Data , 2011, PAM.

[85]  Eric Wustrow,et al.  Internet background radiation revisited , 2010, IMC '10.

[86]  Santosh S. Vempala,et al.  Filtering spam with behavioral blacklisting , 2007, CCS '07.

[87]  Xin Yuan,et al.  Behavioral Characteristics of Spammers and Their Network Reachability Properties , 2007, 2007 IEEE International Conference on Communications.