Unusual internet traffic detection at network edge

Network administrators ensure that all the users within network get fair share of bandwidth, any bandwidth limit violations is identified and provide some additional controls like denied access to particular websites, etc. To achieve this, network administrators monitor all the traffic between the LAN in campus-wide network and the outside Internet world. This monitoring is typically achieved by capturing and analyzing the traffic logs at the Proxy Server, installed between the LAN and the outside Internet. However, this monitoring is primarily statistical in nature and provides no significant actionable results. In our work we have made an attempt to provide a method for intelligent actionable information to network administrators by analyzing and predicting the Internet access behavior at network layer using machine learning algorithms. By network layer we mean that we focus on characterizing traffic at IP address level. For our study we have collected squid proxy server logs and performed analysis of various features of network traffic at network and user level. We estimate the most probable range of values for the various features and determine IP addresses deviating from the normal network access feature values. Thereafter, we have applied four different supervised machine learning algorithms on our labelled dataset and compared these algorithms on various classification matrices like TP, FP, TN and FN. Our results show that Decision Tree and Random Forest give an overall accuracy close to 95%, whereas Naive Bayes and SVM resulted in an overall accuracy of around 85%.

[1]  Nicolas Durand,et al.  Internet user behavior: compared study of the access traces and application to the discovery of communities , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[2]  Anil Rawat,et al.  User and Device Tracking in Private Networks by Correlating Logs: A System for Responsive Forensic Analysis , 2014, 2014 Fourth International Conference on Communication Systems and Network Technologies.

[3]  Yong-jie Wang,et al.  Study on Computer Network Intrusion Effect Evaluation , 2013, 2013 Third International Conference on Instrumentation, Measurement, Computer, Communication and Control.

[4]  Jesús M. González-Barahona,et al.  Temporal characterization of the requests to Wikipedia , 2011, DART@AI*IA.

[5]  R. Krishnamoorthi,et al.  Identifying User Behavior by Analyzing Web Server Access Log File , 2009 .

[6]  George Pallis,et al.  A clustering-based prefetching scheme on a Web cache environment , 2008, Comput. Electr. Eng..

[7]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[8]  Hema A Murthy,et al.  Internet activity analysis through proxy log , 2010, 2010 National Conference On Communications (NCC).

[9]  Li Wei,et al.  Network Traffic Classification Using K-means Clustering , 2007 .

[10]  Maria Kihl,et al.  Traffic analysis and characterization of Internet user behavior , 2010, International Congress on Ultra Modern Telecommunications and Control Systems.

[11]  Rozita Jamili Oskouei,et al.  Internet Usage Pattern by Female Students: A Case Study , 2010, 2010 Seventh International Conference on Information Technology: New Generations.

[12]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.