One-Against-All Methodology for Features Selection and Classification of Internet Applications

Traffic classification by Internet applications, even on off-line mode, can be interesting for many applications such as attack identification, QoS prioritization, network capacity planning and also computer forensic tools. Into the classification problem context is well-known the fact that a higher number of discriminators not necessarily will increase the discrimination power. This work investigates a methodology for features selection and Internet traffic classification in which the problem to classify one among M classes is split in M one-against-all binary classification problems, with each binary problem adopting eventually a set of different discriminators. Different combinations of discriminators selection methods, classification methods and decision algorithms could be embedded into the methodology. To investigate the performance of this methodology we have used the Naive Bayes classifier to select the set of discriminators and for classification. The proposed method intends to reduce the total number of different discriminators used into the classification problem. The methodology was tested for classification of traffic flows and the experimental results showed that we can reduce significantly the number of discriminators per class sustaining the same accuracy level.

[1]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[2]  John Langford,et al.  Weighted One-Against-All , 2005, AAAI.

[3]  J.E.B. Maia,et al.  Attack Detection based on Statistical Discriminators , 2007, 2007 First International Global Information Infrastructure Symposium.

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[6]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[7]  G.P.S. Junior,et al.  P2P Traffic Identification using Cluster Analysis , 2007, 2007 First International Global Information Infrastructure Symposium.

[8]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[9]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[10]  David Moore,et al.  The CoralReef Software Suite as a Tool for System and Network Administrators , 2001, LISA.

[11]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[12]  Andrew W. Moore,et al.  Architecture of a network monitor , 2003 .

[13]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2002, IMW '02.

[14]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[15]  Margaret A. Nemeth,et al.  Applied Multivariate Methods for Data Analysis , 1998, Technometrics.

[16]  Krishna P. Gummadi,et al.  An analysis of Internet content delivery systems , 2002, OPSR.

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[19]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[20]  Raimir Holanda Filho,et al.  An Internet traffic classification methodology based on statistical discriminators , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[21]  António Pacheco,et al.  Cluster Analysis of Internet Users Based on Hourly Traffic Utilization , 2007, IEICE Trans. Commun..

[22]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[23]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[24]  Eyke Hüllermeier,et al.  On Pairwise Naive Bayes Classifiers , 2007, ECML.

[25]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[26]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[27]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[28]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .