Challenging statistical classification for operational usage: the ADSL case

Accurate identification of network traffic according to application type is a key issue for most companies, including ISPs. For example, some companies might want to ban p2p traffic from their network while some ISPs might want to offer additional services based on the application. To classify applications on the fly, most companies rely on deep packet inspection (DPI) solutions. While DPI tools can be accurate, they require constant updates of their signatures database. Recently, several statistical traffic classification methods have been proposed. In this paper, we investigate the use of these methods for an ADSL provider managing many Points of Presence (PoPs). We demonstrate that statistical methods can offer performance similar to the ones of DPI tools when the classifier is trained for a specific site. It can also complement existing DPI techniques to mine traffic that the DPI solution failed to identify. However, we also demonstrate that, even if a statistical classifier is very accurate on one site, the resulting model cannot be applied directly to other locations. We show that this problem stems from the statistical classifier learning site specific information.

[1]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[2]  Brian Rexroad,et al.  Wide-Scale Botnet Detection and Characterization , 2007, HotBots.

[3]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[4]  Dario Rossi,et al.  KISS: Stochastic Packet Inspection , 2009, TMA.

[5]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[6]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[7]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[8]  Dario Rossi,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM '07.

[9]  Guillaume Urvoy-Keller,et al.  Revealing the Unknown ADSL Traffic Using Statistical Methods , 2009, TMA.

[10]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[11]  Abraham Silberschatz,et al.  P4p: provider portal for applications , 2008, SIGCOMM '08.

[12]  Aleksandar Kuzmanovic,et al.  Unconstrained endpoint profiling (googling the internet) , 2008, SIGCOMM '08.

[13]  István Szabó,et al.  On the Validation of Traffic Classification Algorithms , 2008, PAM.

[14]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[15]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[16]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[17]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[18]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.