pForest: In-Network Inference with Random Forests

The concept of "self-driving networks" has recently emerged as a possible solution to manage the ever-growing complexity of modern network infrastructures. In a self-driving network, network devices adapt their decisions in real-time by observing network traffic and by performing in-line inference according to machine learning models. The recent advent of programmable data planes gives us a unique opportunity to implement this vision. One open question though is whether these devices are powerful enough to run such complex tasks? We answer positively by presenting pForest, a system for performing in-network inference according to supervised machine learning models on top of programmable data planes. The key challenge is to design classification models that fit the constraints of programmable data planes (e.g., no floating points, no loops, and limited memory) while providing high accuracy. pForest addresses this challenge in three phases: (i) it optimizes the features selection according to the capabilities of programmable network devices; (ii) it trains random forest models tailored for different phases of a flow; and (iii) it applies these models in real time, on a per-packet basis. We fully implemented pForest in Python (training), and in P4_16 (inference). Our evaluation shows that pForest can classify traffic at line rate for hundreds of thousands of flows, with an accuracy that is on-par with software-based solutions. We further show the practicality of pForest by deploying it on existing hardware devices (Barefoot Tofino).

[1]  Ajith Pasqual,et al.  High performance parallel packet Classification architecture with Popular Rule Caching , 2012, 2012 18th IEEE International Conference on Networks (ICON).

[2]  Maurizio Dusi,et al.  Quantifying the accuracy of the ground truth associated with Internet traffic traces , 2011, Comput. Networks.

[3]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[4]  Yang Bo,et al.  Traffic Labeller: Collecting Internet traffic samples with accurate application information , 2014, China Communications.

[5]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[6]  Robert Soulé,et al.  Packet Subscriptions for Programmable ASICs , 2018, HotNets.

[7]  Sheldon B. Akers,et al.  Binary Decision Diagrams , 1978, IEEE Transactions on Computers.

[8]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[9]  Sandrine Vaton,et al.  High‐speed flow‐based classification on FPGA , 2014, Int. J. Netw. Manag..

[10]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[11]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[12]  Maya Gokhale,et al.  Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA? , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Dan Meng,et al.  On Accuracy of Early Traffic Classification , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[15]  Nick Feamster,et al.  Why (and How) Networks Should Run Themselves , 2017, ANRW.

[16]  Alberto Dainotti,et al.  Blink: Fast Connectivity Recovery Entirely in the Data Plane , 2019, NSDI.

[17]  Viktor K. Prasanna,et al.  Large-scale wire-speed packet classification on FPGAs , 2009, FPGA '09.

[18]  Béla Hullár,et al.  Early Identification of Peer-to-Peer Traffic , 2011, 2011 IEEE International Conference on Communications (ICC).

[19]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[20]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.

[21]  Tamas Tothfalusi,et al.  FPGA-Assisted DPI Systems: 100 Gbit/s and Beyond , 2019, IEEE Communications Surveys & Tutorials.

[22]  Erwan Scornet,et al.  Neural Random Forests , 2016, Sankhya A.

[23]  Viktor K. Prasanna,et al.  StrideBV: Single chip 400G+ packet classification , 2012, 2012 IEEE 13th International Conference on High Performance Switching and Routing.

[24]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[25]  Viktor K. Prasanna,et al.  Multi-dimensional packet classification on FPGA: 100 Gbps and beyond , 2010, 2010 International Conference on Field-Programmable Technology.

[26]  Bo Yang,et al.  Effectiveness of Statistical Features for Early Stage Internet Traffic Identification , 2016, International Journal of Parallel Programming.

[27]  Wolfgang Kellerer,et al.  Empowering Self-Driving Networks , 2018, SelfDN@SIGCOMM.

[28]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[29]  Cedric Baudoin,et al.  Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey , 2019, IEEE Communications Surveys & Tutorials.

[30]  Wolfgang Kellerer,et al.  Adaptable and Data-Driven Softwarized Networks: Review, Opportunities, and Challenges , 2019, Proceedings of the IEEE.

[31]  Richard E. Overill,et al.  Network traffic classification techniques and challenges , 2015, 2015 Tenth International Conference on Digital Information Management (ICDIM).

[32]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[33]  Nick McKeown,et al.  A network in a laptop: rapid prototyping for software-defined networks , 2010, Hotnets-IX.