User traffic classification for proxy-server based internet access control

In a LAN, Internet access should be managed well for a better user experience. Those using a larger share of the bandwidth may be restricted during peak hours to enable others to use the Internet. This can be viewed as a problem of classifying the users based on their Internet usage into normal and high categories, following which control policies may be applied. For this purpose, a proxy-based mechanism has been proposed for classification of users according to the share of their Internet access. The advantage of this approach is that users sharing the same computer can be distinguished by the proxy server and appropriate control policies can be exercised. To understand user behaviour, data is collected at the proxy server in a campus LAN. Machine learning algorithms are then used to learn and characterise user behaviour. In particular, Naive Bayes' and Gaussian Mixture Model based classifiers are used. It is observed that the algorithms are able to scale in that users are clustered into two different groups. Performance evaluation on a held out data set indicates that users can be accurately distinguished 94.96% of the time. The algorithm is also practical since the time consuming task of model building need be done only once a month offline, while the daily task of classification may be accomplished in a period of 20 mins for GMMs. It has also been shown how the user behavior of the two groups of users may be characterized. This would be a useful aid in the design of policies and algorithms for Internet access control.

[1]  Phuoc Tran-Gia,et al.  Traffic Measurement and Analysis of a Broadband Wireless Internet Access , 2009, VTC Spring 2009 - IEEE 69th Vehicular Technology Conference.

[2]  Anurag Kumar,et al.  Nonintrusive TCP connection admission control for bandwidth management of an Internet access link , 2000, IEEE Commun. Mag..

[3]  Shi-Chung Chang,et al.  Management of abusive and unfair Internet access by quota-based priority control , 2004, Comput. Networks.

[4]  Fayez Gebali,et al.  Distributed Layer-3 E-Mail Classification for Spam Control , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.

[5]  Maria Kihl,et al.  Traffic analysis and characterization of Internet user behavior , 2010, International Congress on Ultra Modern Telecommunications and Control Systems.

[6]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[10]  Hema A Murthy,et al.  Internet activity analysis through proxy log , 2010, 2010 National Conference On Communications (NCC).

[11]  Youquan Zheng,et al.  A new fairness criterion and its realization by using a new scheduling algorithm in the Internet , 2001, Proceedings. Sixth IEEE Symposium on Computers and Communications.

[12]  Shi-Chung Chang,et al.  Time-of-day Internet-access management by combining empirical data-based pricing with quota-based priority control , 2007, IET Commun..

[13]  Uthpala Premarathne,et al.  Network traffic self similarity measurements using classifier based Hurst parameter estimation , 2010, 2010 Fifth International Conference on Information and Automation for Sustainability.

[14]  Laurent Massoulié,et al.  Bandwidth sharing: objectives and algorithms , 2002, TNET.

[15]  Frank Kelly,et al.  Charging and rate control for elastic traffic , 1997, Eur. Trans. Telecommun..

[16]  H. A. Murthy,et al.  Port-based traffic verification as a paradigm for anomaly detection , 2012, 2012 National Conference on Communications (NCC).

[17]  Feng Zhenming,et al.  A new fairness criterion and its realization by using a new scheduling algorithm in the Internet , 2001 .

[18]  Maurizio Dusi,et al.  Using GMM and SVM-Based Techniques for the Classification of SSH-Encrypted Traffic , 2009, 2009 IEEE International Conference on Communications.

[19]  W. Timothy Strayer,et al.  Using Machine Learning Techniques to Identify Botnet Traffic , 2006 .

[20]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..