Quantifying the accuracy of the ground truth associated with Internet traffic traces

Ground truth information for Internet traffic traces is often derived by means of port analysis and payload inspection (Deep Packet Inspection - DPI). In this paper we analyze the errors that DPI and port analysis commit when assigning protocol labels to traffic traces. We compare the ground truth provided by these approaches with that derived by gt, a tool that we developed, which provides error-free ground truth at the application level by construction. Experimental results demonstrate that, depending on the protocols composing a trace, ground truth information from port analysis and DPI can be incorrect for up to 91% and 26% of the labeled bytes, respectively.

[1]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[2]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[3]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[4]  Alex X. Liu,et al.  High-Speed Flow Nature Identification , 2009, ICDCS.

[5]  Amir R. Khakpour,et al.  An Information-Theoretical Approach to High-Speed Flow Nature Identification , 2013, IEEE/ACM Transactions on Networking.

[6]  David Moore,et al.  The CoralReef Software Suite as a Tool for System and Network Administrators , 2001, LISA.

[7]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[8]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[9]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[10]  Luca Salgarelli,et al.  On the stability of the information carried by traffic flow features at the packet level , 2009, CCRV.

[11]  Riyad Alshammari,et al.  Machine learning based encrypted traffic classification: Identifying SSH and Skype , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[12]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[13]  Michalis Faloutsos,et al.  Profiling the End Host , 2007, PAM.

[14]  Alex X. Liu,et al.  High-Speed Flow Nature Identification , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[15]  Fulvio Risso,et al.  Extending the NetPDL Language to Support Traffic Classification , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[16]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[17]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[18]  Marco Canini,et al.  GTVS: Boosting the Collection of Application Traffic Ground Truth , 2009, TMA.

[19]  István Szabó,et al.  On the Validation of Traffic Classification Algorithms , 2008, PAM.