Generalization and optimization of feature set for accurate identification of P2P Traffic in the internet using neural network

P2P applications supposedly constitute a substantial proportion of today's Internet traffic. The ability to accurately identify different P2P applications in internet traffic is important to a broad range of network operations including application-specific traffic engineering, capacity planning, resource provisioning, service differentiation, etc. In this paper, we present a Neural Network approach that precisely identifies the P2P traffic using Multi-Layer Perceptron (MLP) neural network. It is general practice to reduce the cost of classification by reducing the number of features, utilizing some feature selection algorithm. The reduced feature set produced by such algorithms are highly data-dependent and are different for different data sets. Further the feature set produced from one data set does not yield good results when tried upon other data sets. We propose an optimum and universal set of features which is independent of training and test data sets. The proposed feature set has enabled us to achieve significant improvement in performance of the MLP classifier. The few features in the proposed feature set results in a significant reduction in training time, while maintaining the performance, thereby making this approach suitable for real-time implementation.

[1]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[2]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[3]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[4]  John C. S. Lui,et al.  Profiling and identification of P2P traffic , 2009, Comput. Networks.

[5]  Bijan Raahemi,et al.  Peer-to-Peer IP Traffic Classification Using Decision Tree and IP Layer Attributes , 2007, Int. J. Bus. Data Commun. Netw..

[6]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[9]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[10]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[11]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[12]  Andrew W. Moore,et al.  Traffic Classification Using a Statistical Approach , 2005, PAM.

[13]  Michelangelo Ceci,et al.  Redundant feature elimination for multi-class problems , 2004, ICML.

[14]  Sebastian Zander,et al.  Evaluating machine learning algorithms for automated network application identification , 2006 .

[15]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[16]  D. Signorini,et al.  Neural networks , 1995, The Lancet.