Since heavy flows account for a significant fraction of network traffic, being able to predict heavy flows has benefited many network management applications for mitigating link congestion, scheduling of network capacity, exposing network attacks and so on. Existing machine learning based predictors are largely implemented on the control plane of Software Defined Networking (SDN) paradigm. As a result, frequent communication between the control and data planes can cause unnecessary overhead and additional delay in decision making. In this paper, we present pHeavy, a machine learning based scheme for predicting heavy flows directly on the programmable data plane, thus eliminating network overhead and latency to SDN controller. Considering the scarce memory and limited computation capability in the programmable data plane, pHeavy includes a packet processing pipeline which deploys pre-trained decision tree models for in-network prediction. We have implemented pHeavy in both bmv2 software switch and P4 hardware switch (i.e., Barefoot Tofino). Evaluation results demonstrate that pHeavy has achieved 85% and 98% accuracy after receiving the first 5 and 20 packets of a flow respectively, while being able to reduce the size of decision tree by 5.4x on average. More importantly, pHeavy can predict heavy flows at line rate on the P4 hardware switch.
[1]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.
[2]
Francisco Herrera,et al.
On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed
,
2014,
Inf. Sci..
[3]
S. Muthukrishnan,et al.
Heavy-Hitter Detection Entirely in the Data Plane
,
2016,
SOSR.
[4]
Grenville J. Armitage,et al.
A survey of techniques for internet traffic classification using machine learning
,
2008,
IEEE Communications Surveys & Tutorials.
[5]
Stan Matwin,et al.
Addressing the Curse of Imbalanced Training Sets: One-Sided Selection
,
1997,
ICML.
[6]
Fernando De la Torre,et al.
Facing Imbalanced Data--Recommendations for the Use of Performance Metrics
,
2013,
2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.