Dealing with P2P traffic in modern networks: measurement, identification and control. (La gestion du trafic P2P dans les réseaux modernes : mesure, identification et contrôle)

Due to the large diffusion of P2P applications and especially P2P live-streaming, P2P traffic occupies an extremely large portion of overall Internet traffic. In this context, this thesis proposes new instruments to measure, identify and control P2P traffic. Regarding traffic classification, since traditional techniques have a hard time identifying P2P traffic, we propose a new behavioral classifier, Abacus, tailored for P2P live-streaming. Our experiments prove that Abacus, though based on simple counts of packets and bytes exchanged by a host, represents a lightweight and accurate solution for identifying P2P applications. Second, since the huge volume of traffic obliges operators to employ either flow-level monitors (e. G. NetFlow) or packet sampling to cut down the amount of measurement data, we evaluate the impact of data reduction on traffic characterization and classification. We show that Abacus can be adapted to this kind of data, suffering only a minor loss in accuracy and statistical classification remains possible if training and validation data are sampled at the same rate, in spite of the distortion introduced by packet sampling. Finally, we study a new transport protocol for P2P traffic, LEDBAT (Low Extra Delay Background Transport Protocol), the congestion control algorithm of the official BitTorrent client. This delay-based algorithm aims to provide an efficient, lower-than-best-effort service. Though faithful to its goals, the original design of LEDBAT appears affected by a latecomer advantage: we identify the main cause of the unfairness and propose effectives correction that restore the fairness.

[1]  Eddie Kohler,et al.  Small is not always beautiful , 2008, IPTPS.

[2]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[3]  Alex X. Liu,et al.  High-Speed Flow Nature Identification , 2009, ICDCS.

[4]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[5]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[7]  George C. Polyzos,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM '93.

[8]  Zhi-Li Zhang,et al.  Adaptive random sampling for load change detection , 2002, SIGMETRICS '02.

[9]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[10]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[11]  Nick G. Duffield,et al.  Sampling and Filtering Techniques for IP Packet Selection , 2009, RFC.

[12]  Antonio Pescapè,et al.  TIE: A Community-Oriented Traffic Classification Platform , 2009, TMA.

[13]  Richard Nelson,et al.  Application flow control in YouTube video streams , 2011, CCRV.

[14]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection Classifier for UDP Traffic , 2010, IEEE/ACM Transactions on Networking.

[15]  Fulvio Risso,et al.  Comparing P2PTV Traffic Classifiers , 2010, 2010 IEEE International Conference on Communications.

[16]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[17]  Anja Feldmann,et al.  NetFlow: information loss or win? , 2002, IMW '02.

[18]  Larry L. Peterson,et al.  TCP Vegas: new techniques for congestion detection and avoidance , 1994 .

[19]  Hui Zang,et al.  Is sampled data sufficient for anomaly detection? , 2006, IMC '06.

[20]  Rolf Winter,et al.  Out of my way - evaluating Low Extra Delay Background Transport in an ADSL access network , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[21]  Danny Bickson,et al.  The eMule Protocol Specification , 2005 .

[22]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[23]  Giorgio Ventre,et al.  Network Simulator NS2 , 2008 .

[24]  F. Verhulst Nonlinear Differential Equations and Dynamical Systems , 1989 .

[25]  Dario Rossi,et al.  Identifying Key Features for P2P Traffic Classification , 2011, 2011 IEEE International Conference on Communications (ICC).

[26]  Qing Gao,et al.  Adaptive Vegas: A Solution of Unfairness Problem for TCP Vegas , 2005, ICOIN.

[27]  Dario Rossi,et al.  User patience and the Web: a hands-on investigation , 2003, GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).

[28]  Shao Liu,et al.  4 CP : Competitive and Considerate Congestion Control Protocol , 2006 .

[29]  Alex X. Liu,et al.  High-Speed Flow Nature Identification , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[30]  Masayuki Murata,et al.  Fairness and stability of congestion control mechanisms of TCP , 2000, Telecommun. Syst..

[31]  Luca De Cicco,et al.  Skype video responsiveness to bandwidth variations , 2008, NOSSDAV.

[32]  Shigeki Goto,et al.  Identifying elephant flows through periodically sampled packets , 2004, IMC '04.

[33]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[34]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[35]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[36]  Dario Rossi,et al.  The Quest for LEDBAT Fairness , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[37]  Venkata N. Padmanabhan,et al.  Analyzing and Improving a BitTorrent Networks Performance Mechanisms , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[38]  Wenjie Wang,et al.  Live streaming performance of the Zattoo network , 2009, IMC '09.

[39]  Dario Rossi,et al.  Sherlock: A framework for P2P traffic analyis , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[40]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[41]  Nicolae Tapus,et al.  Performance evaluation of a Python implementation of the new LEDBAT congestion control algorithm , 2010, 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR).

[42]  Dario Rossi,et al.  Passive analysis of TCP anomalies , 2008, Comput. Networks.

[43]  Dario Rossi,et al.  Rethinking the Low Extra Delay Background Transport (LEDBAT) Protocol , 2013, Comput. Networks.

[44]  Chuan Wu,et al.  Exploring large-scale peer-to-peer live streaming topologies , 2008, TOMCCAP.

[45]  Kenneth J. Christensen,et al.  Adaptive sampling methods to determine network traffic statistics including the Hurst parameter , 1998, Proceedings 23rd Annual Conference on Local Computer Networks. LCN'98 (Cat. No.98TB100260).

[46]  Amuda James Abu,et al.  A Dynamic Algorithm for Stabilising LEDBAT Congestion Window , 2010, 2010 Second International Conference on Computer and Network Technology.

[47]  Rayadurgam Srikant,et al.  Modeling and performance analysis of BitTorrent-like peer-to-peer networks , 2004, SIGCOMM 2004.

[48]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[49]  Dario Rossi,et al.  Building a cooperative P2P-TV application over a wise network: the approach of the European FP-7 strep NAPA-WINE , 2008, IEEE Communications Magazine.

[50]  Alan D. George,et al.  Adaptive Sampling for Network Management , 2001, Journal of Network and Systems Management.

[51]  Pere Barlet-Ros,et al.  Portscan Detection with Sampled NetFlow , 2009, TMA.

[52]  Michalis Faloutsos,et al.  Profiling the End Host , 2007, PAM.

[53]  R. Srikant,et al.  TCP-Illinois: a loss and delay-based congestion control algorithm for high-speed networks , 2006, valuetools '06.

[54]  Janardhan R. Iyengar,et al.  Low Extra Delay Background Transport (LEDBAT) , 2012, RFC.

[55]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[56]  Philippe Robert,et al.  Deterministic Versus Probabilistic Packet Sampling in the Internet , 2007, ITC.

[57]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[58]  Dario Rossi,et al.  Gauging the network friendliness of P 2 P applications , 2009 .

[59]  Fulvio Risso,et al.  Lightweight, Payload-Based Traffic Classification: An Experimental Evaluation , 2008, 2008 IEEE International Conference on Communications.

[60]  Dario Rossi,et al.  Fine-grained traffic classification with netflow data , 2010, IWCMC.

[61]  Dario Rossi,et al.  Exploiting packet‐sampling measurements for traffic characterization and classification , 2012, Int. J. Netw. Manag..

[62]  Luca Deri,et al.  nProbe: an Open Source NetFlow Probe for Gigabit Networks , 2003 .

[63]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[64]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[65]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[66]  Bo Li,et al.  CoolStreaming/DONet: a data-driven overlay network for peer-to-peer live media streaming , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[67]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[68]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[69]  Zhi-Li Zhang,et al.  Inferring applications at the network layer using collective traffic statistics , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[70]  A. Singh Challenges " # , 2006 .

[71]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[72]  Dario Rossi,et al.  Kiss to Abacus: A Comparison of P2P-TV Traffic Classifiers , 2010, TMA.

[73]  Jean C. Walrand,et al.  Analysis and comparison of TCP Reno and Vegas , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[74]  Arthur Stanley,et al.  Yes , 1923, The Hospital and health review.

[75]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[76]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[77]  J.-Y. Le Boudec,et al.  A note on the fairness of TCP Vegas , 2000, 2000 International Zurich Seminar on Broadband Communications. Accessing, Transmission, Networking. Proceedings (Cat. No.00TH8475).

[78]  Steve Romig,et al.  The OSU Flow-tools Package and CISCO NetFlow Logs , 2000, LISA.

[79]  Marco Mellia,et al.  Measuring IP and TCP behavior on edge nodes with Tstat , 2005, Comput. Networks.

[80]  Massimo Gallo,et al.  P2P-TV Systems under Adverse Network Conditions: A Measurement Study , 2009, IEEE INFOCOM 2009.

[81]  William Chan,et al.  Improving Traffic Locality in BitTorrent via Biased Neighbor Selection , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[82]  Dario Rossi,et al.  Yes, We LEDBAT: Playing with the New BitTorrent Congestion Control Algorithm , 2010, PAM.

[83]  Larry Peterson,et al.  Inter-AS traffic patterns and their implications , 1999, Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM'99. (Cat. No.99CH37042).

[85]  Benoit Claise,et al.  Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information , 2008, RFC.

[86]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[87]  Dario Rossi,et al.  Understanding Skype signaling , 2009, Comput. Networks.

[88]  Cheng Huang,et al.  Challenges, design and analysis of a large-scale p2p-vod system , 2008, SIGCOMM '08.

[89]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[90]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[91]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[92]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[93]  Lillian N. Cassel,et al.  Management of sampled real-time network measurements , 1989, [1989] Proceedings. 14th Conference on Local Computer Networks.

[94]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[95]  Dario Rossi,et al.  Fine-grained behavioral classification in the core: the issue of flow sampling , 2011, 2011 7th International Wireless Communications and Mobile Computing Conference.

[96]  Nick Duffield,et al.  Sampling for Passive Internet Measurement: A Review , 2004 .

[97]  Arvind Krishnamurthy,et al.  P 4 P : Explicit Communications for Cooperative Control Between P 2 P and Network Providers , 2007 .

[98]  Amuda James Abu,et al.  Impact of Delay Variability on LEDBAT Performance , 2011, 2011 IEEE International Conference on Advanced Information Networking and Applications.

[99]  Masayuki Murata,et al.  Fairness and stability of congestion control mechanisms of TCP , 1998, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[100]  Arun Venkataramani,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tcp Nice: a Mechanism for Background Transfers , 2022 .

[101]  Luca Salgarelli,et al.  On the stability of the information carried by traffic flow features at the packet level , 2009, CCRV.

[102]  Dario Rossi,et al.  Network Awareness of P2P Live Streaming Applications: A Measurement Study , 2010, IEEE Transactions on Multimedia.

[103]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[104]  Vern Paxson End-to-end routing behavior in the internet , 2006, Comput. Commun. Rev..

[105]  Tanja Zseby,et al.  Deployment of Sampling Methods for SLA Validation with Non-Intrusive Measurements , 2002 .

[106]  Dario Rossi,et al.  Experiences of Internet traffic monitoring with tstat , 2011, IEEE Network.

[107]  Dario Rossi,et al.  Stochastic Packet Inspection for TCP Traffic , 2010, 2010 IEEE International Conference on Communications.

[108]  C.-C. Jay Kuo,et al.  Internet Traffic Classification for Scalable QOS Provision , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[109]  Albert Cabellos-Aparicio,et al.  Analysis of the impact of sampling on NetFlow traffic classification , 2011, Comput. Networks.

[110]  Dario Rossi,et al.  Accurate, Fine-Grained Classification of P2P-TV Applications by Simply Counting Packets , 2009, TMA.

[111]  Xingang Shi,et al.  PBS: Periodic Behavioral Spectrum of P2P Applications , 2009, PAM.

[112]  Dario Rossi,et al.  Detailed Analysis of Skype Traffic , 2009, IEEE Transactions on Multimedia.

[113]  Fred Baker,et al.  Cisco Architecture for Lawful Intercept in IP Networks , 2004, RFC.

[114]  Dario Rossi,et al.  An abacus for P2P-TV traffic classification , 2009, INFOCOM 2009.

[115]  R. Quinlan,et al.  Decision tree discovery , 1999 .

[116]  Olivier Fourmaux,et al.  Measuring P2P IPTV Systems , 2007 .

[117]  Mikel Izal,et al.  Dissecting BitTorrent: Five Months in a Torrent's Lifetime , 2004, PAM.

[118]  Luca Salgarelli,et al.  Comparing traffic classifiers , 2007, CCRV.

[119]  Robert Shorten,et al.  Making Available Base-RTT for Use in Congestion Control Applications , 2008, IEEE Communications Letters.

[120]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[121]  Abhishek Kumar,et al.  Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[122]  Emin Gün Sirer,et al.  Meridian: a lightweight network location service without virtual coordinates , 2005, SIGCOMM '05.

[123]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[124]  Vern Paxson,et al.  Issues and etiquette concerning use of shared measurement data , 2007, IMC '07.

[125]  David Moore,et al.  The CoralReef Software Suite as a Tool for System and Network Administrators , 2001, LISA.

[126]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[127]  David Harrison,et al.  Accumulation-based congestion control , 2005, IEEE/ACM Transactions on Networking.

[128]  Unai Arronategui,et al.  Behavioural Characterization for Network Anomaly Detection , 2009, Trans. Comput. Sci..

[129]  Dario Rossi,et al.  On the impact of sampling on traffic monitoring and analysis , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[130]  Chin-Laung Lei,et al.  Peer-to-Peer Application Recognition Based on Signaling Activity , 2009, 2009 IEEE International Conference on Communications.

[131]  Laurent Massoulié,et al.  Emulating low-priority transport at the application layer: a background transfer service , 2004, SIGMETRICS '04/Performance '04.

[132]  Nick G. Duffield,et al.  Trajectory sampling for direct traffic observation , 2001, TNET.

[133]  Dario Rossi,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM '07.

[134]  Bo Li,et al.  A modeling framework of content pollution in Peer-to-Peer video streaming systems , 2009, Comput. Networks.

[135]  Zihui Ge,et al.  Lightweight application classification for network management , 2007, INM '07.

[136]  Qian Zhang,et al.  A Compound TCP Approach for High-Speed and Long Distance Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[137]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[138]  Keith W. Ross,et al.  A Measurement Study of a Large-Scale P2P IPTV System , 2007, IEEE Transactions on Multimedia.

[139]  Donald F. Towsley,et al.  Fisher information of sampled packets: an application to flow size estimation , 2006, IMC '06.

[140]  Martin May,et al.  Impact of packet sampling on anomaly detection metrics , 2006, IMC '06.

[141]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[142]  V. Jacobson,et al.  Congestion avoidance and control , 1988, CCRV.

[143]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[144]  Dario Rossi,et al.  LEDBAT: The New BitTorrent Congestion Control Protocol , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[145]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[146]  George Varghese,et al.  Network monitoring using traffic dispersion graphs (tdgs) , 2007, IMC '07.

[147]  E.W. Knightly,et al.  TCP-LP: low-priority service via end-point congestion control , 2006, IEEE/ACM Transactions on Networking.

[148]  S. Hemminger Network Emulation with NetEm , 2022 .

[149]  Dario Rossi,et al.  News from the Internet congestion control world , 2009, ArXiv.

[150]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[151]  Dario Rossi,et al.  Abacus: Accurate behavioral classification of P2P-TV traffic , 2011, Comput. Networks.

[152]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[153]  Dario Rossi,et al.  A hands-on assessment of transport protocols with lower than best effort priority , 2010, IEEE Local Computer Network Conference.

[154]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[155]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[156]  Guillaume Urvoy-Keller,et al.  Challenging statistical classification for operational usage: the ADSL case , 2009, IMC '09.

[157]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[158]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[159]  Anirban Mahanti,et al.  Byte me: a case for byte accuracy in traffic classification , 2007, MineNet '07.

[160]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[161]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.