Identifying and discriminating between web and peer-to-peer traffic in the network core

Traffic classification is the ability to identify and categorize network traffic by application type. In this paper, we consider the problem of traffic classification in the network core.Classification at the core is challenging because only partial information about the flows and their contributors is available. We address this problem by developing a framework that can classify a flow using only unidirectional flow information. We evaluated this approach using recent packet traces that we collected and pre-classified to establish a "base truth". From our evaluation, we find that flow statistics for the server-to-client direction of a TCP connection provide greater classification accuracy than the flow statistics for the client-to-server direction. Because collection of the server-to-client flow statistics may not always be feasible, we developed and validated an algorithm that can estimate the missing statistics froma unidirectional packet trace.

[1]  Anja Feldmann,et al.  An analysis of Internet chat systems , 2003, IMC '03.

[2]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[3]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[4]  Carsten Lund,et al.  Flow sampling under hard resource constraints , 2004, SIGMETRICS '04/Performance '04.

[5]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[8]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[9]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[10]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[11]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[12]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[13]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[14]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[15]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[16]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[17]  Kevin Jeffay,et al.  What TCP/IP protocol headers can tell us about the web , 2001, SIGMETRICS '01.

[18]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[19]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[20]  George Varghese,et al.  Building a better NetFlow , 2004, SIGCOMM 2004.

[21]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[22]  Carey L. Williamson,et al.  A Longitudinal Study of P2P Traffic Classification , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[23]  Taesang Choi,et al.  Content-aware Internet application traffic measurement and analysis , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[24]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[25]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[26]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[27]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[28]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[29]  Ramana Rao Kompella,et al.  The power of slicing in internet flow measurement , 2005, IMC '05.