Unsupervised Machine Learning-Based Elephant and Mice Flow Identification

Internet today holds traffic from a wide range of applications, which have different requirements and constraints on the resources of a network. Hence, it is normal to find a variety of flows with dissimilar features that contend for the network resources. Consequently, the problem that appears clearly is an unfair use of these resources by particular flows. This problem exposed to the so-called elephant and mice flows through real analysis of network traffic. Therefore, this problem might lead to degrading network performance. In this paper, we proposed a framework to optimize the network performance through characterising elephant and mice flows based on network performance metrics. The framework has three parts. Principal component analysis (PCA) is used in the first part to reduce the dimensionality. The next part was responsible for partitioning the traffic into distinct groups based on performance metrics such as packet loss, round trip time (RTT), and throughput by using an unsupervised clustering method with k-means. Finally, for each cluster, flows have been identified as huge (elephant) and small (mice) based on threshold values for the predefined parameters. Our results show that there is a potential in using network performance features to cluster the network traffic and to identify mice and elephant flows based on the number of packets, flow size, and duration of flow. We analyzed a (2 GB pcap file) to build our dataset. Finally, our proposed framework is capable of characterizing mice and elephant flows based on network performance metrics for each cluster.

[1]  Marco Ruffini,et al.  An Overview on Application of Machine Learning Techniques in Optical Networks , 2018, IEEE Communications Surveys & Tutorials.

[2]  Jens Myrup Pedersen,et al.  A method for classification of network traffic based on C5.0 Machine Learning Algorithm , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[3]  Martín Casado,et al.  NOX: towards an operating system for networks , 2008, CCRV.

[4]  Yi Sun,et al.  Freeway: Adaptively Isolating the Elephant and Mice Flows on Different Transmission Paths , 2014, 2014 IEEE 22nd International Conference on Network Protocols.

[5]  Hardeep Singh,et al.  Performance Analysis of Unsupervised Machine Learning Techniques for Network Traffic Classification , 2015, 2015 Fifth International Conference on Advanced Computing & Communication Technologies.

[6]  Bogdan V. Ghita,et al.  A novel approach for performance-based clustering and anagement of network traffic flows , 2019, 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC).

[7]  Wenzhi Cui,et al.  DiFS: Distributed Flow Scheduling for adaptive switching in FatTree data center networks , 2016, Comput. Networks.

[8]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM 2011.

[9]  Seref Sagiroglu,et al.  Big data analytics for network anomaly detection from netflow data , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[10]  Roy Friedman,et al.  Optimal elephant flow detection , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[11]  Jun Zhang,et al.  Unsupervised traffic classification using flow statistical properties and IP packet payload , 2013, J. Comput. Syst. Sci..

[12]  Bogdan Ghita,et al.  Using Burstiness for Network Applications Classification , 2019, J. Comput. Networks Commun..

[13]  T. V. Lakshman,et al.  Design considerations for supporting TCP with per-flow queueing , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[14]  Michel Cukier,et al.  Identifying infected users via network traffic , 2019, Comput. Secur..

[15]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[16]  Dingding Zhou,et al.  Flow Cluster Algorithm Based on Improved K-means Method , 2013 .

[17]  Xin Wu,et al.  DARD: Distributed Adaptive Routing for Datacenter Networks , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[18]  Long Chen,et al.  Elephant Flow Detection and Load-Balanced Routing with Efficient Sampling and Classification , 2021, IEEE Transactions on Cloud Computing.

[19]  Yang Xin,et al.  Network traffic classification based on semi-supervised clustering , 2010 .