A framework for tunneled traffic analysis

Research in traffic classification is reaching into ever more difficult areas. Traditional techniques such as header and payload inspection are not providing sufficient information due to usage of non-standard ports and encryption. Promising alternative methods have been proposed based on the statistical behaviour of traffic flows. Although these methods can achieve quite high accuracies in non-encrypted traffic flows, traffic identification of encrypted traffic flows is still in its early stages. We argue that the results to date for encrypted traffic cannot help a network device such as a firewall make any useful decision, nor are there any indications that this may be achieved in the future. We propose a novel approach to cope with encrypted peer to peer network layer tunnels which are a particular problem in schools, universities, and larger corporate networks. First statistical techniques are used to identify the protocols present, a process that may take in the order of seconds. Next, based on the protocols discovered, and enterprise policies, a network device is advised to block, band-limit, or allow the whole tunnel, or a range of packet sizes within that tunnel. Preliminary research has concluded that VoIP traffic can be successfully handled by this approach and that advise to a network device can be practically useful. Work continues to apply these techniques to other protocols and mixes of protocols within a peer to peer tunnels.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[3]  Riyad Alshammari,et al.  A flow based approach for SSH traffic detection , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[4]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[5]  Harold Joseph Highland,et al.  The 17th NSCS abstructArtificial Intelligence and Intrusion Detection: Current and Future Directions : Jeremy Frank, University of California, Davis, CA , 1995 .

[6]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[7]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[8]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[9]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[10]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[11]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[12]  Tatu Ylönen,et al.  The Secure Shell (SSH) Protocol Architecture , 2006, RFC.

[13]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[14]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[15]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[16]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[17]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[18]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[19]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[20]  G.P.S. Junior,et al.  P2P Traffic Identification using Cluster Analysis , 2007, 2007 First International Global Information Infrastructure Symposium.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .