FENXI: Deep-learning Traffic Analytics at the edge

Live traffic analysis at the first aggregation point in the ISP network enables the implementation of complex traffic engineering policies but is limited by the scarce processing capabilities, especially for Deep Learning (DL) based analytics. The introduction of specialized hardware accelerators i.e., Tensor Processing Unit (TPU), offers the opportunity to enhance processing capabilities of network devices at the edge. Yet, to date, no packet processing pipeline is capable of offering DL-based analysis capabilities in the data-plane, without interfering with network operations. In this paper, we present FENXI, a system to run complex analytics by leveraging TPU. The design of FENXI decouples forwarding operations and traffic analytics which operates at different granularities i.e., packet and flow levels. We conceive two independent modules that asynchronously communicate to exchange network data and analytics results, and design data structures to extract flow level statistics without impacting per-packet processing. We prototyped and evaluated FENXI on general-purpose servers considering both both adversarial and realistic network conditions. Our analysis shows that FENXI can sustains 100Gbps line rate traffic processing requiring only limited resources, while also dynamically adapting to variable network conditions.

[1]  H. Bal,et al.  Clownfish: Edge and Cloud Symbiosis for Video Stream Analytics , 2020, 2020 IEEE/ACM Symposium on Edge Computing (SEC).

[2]  Roberto Bifulco,et al.  In-network Neural Networks , 2018, ArXiv.

[3]  Natalie D. Enright Jerger,et al.  Doppelgänger: A cache for approximate computing , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Panos Kalnis,et al.  In-Network Computation is a Dumb Idea Whose Time Has Come , 2017, HotNets.

[5]  Pedro Casas,et al.  DeepSec meets RawPower - Deep Learning for Detection of Network Attacks Using Raw Representations , 2019, PERV.

[6]  Chi Harold Liu,et al.  Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[7]  Abdelhamid MELLOUK,et al.  Network troubleshooting: Survey, Taxonomy and Challenges , 2018, 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT).

[8]  Jeremy Kepner,et al.  Survey and Benchmarking of Machine Learning Accelerators , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Beng Chin Ooi,et al.  Rafiki: Machine Learning as an Analytics Service System , 2018, Proc. VLDB Endow..

[10]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[11]  Saverio Niccolini,et al.  Net2Vec: Deep Learning for the Network , 2017, Big-DAMA@SIGCOMM.

[12]  Giuseppe Aceto,et al.  Mobile Encrypted Traffic Classification Using Deep Learning: Experimental Evaluation, Lessons Learned, and Challenges , 2019, IEEE Transactions on Network and Service Management.

[13]  Dario Rossi,et al.  Opening the Deep Pandora Box: Explainable Traffic Classification , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[14]  Roberto Bifulco,et al.  Can the Network be the AI Accelerator? , 2018, NetCompute@SIGCOMM.

[15]  Cedric Baudoin,et al.  Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey , 2019, IEEE Communications Surveys & Tutorials.

[16]  Wei Wang,et al.  MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.

[17]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[18]  Bin Fan,et al.  MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing , 2013, NSDI.

[19]  Andrew W. Moore,et al.  Understanding PCIe performance for end host networking , 2018, SIGCOMM.

[20]  KyoungSoo Park,et al.  Scalable TCP Session Monitoring with Symmetric Receive-side Scaling , 2012 .

[21]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[22]  Alexander Rucker,et al.  Taurus: An Intelligent Data Plane , 2020, ArXiv.

[23]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[24]  Byung-Gon Chun,et al.  PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems , 2018, OSDI.

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  Hamed Haddadi,et al.  Running Neural Networks on the NIC , 2020, 2009.02353.

[27]  Noa Zilberman,et al.  Do Switches Dream of Machine Learning?: Toward In-Network Classification , 2019, HotNets.

[28]  Nathan S. Netanyahu,et al.  DeepSign: Deep learning for automatic malware signature generation and classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[29]  Laurent Vanbever,et al.  pForest: In-Network Inference with Random Forests , 2019, ArXiv.

[30]  Chen Qian,et al.  PostMan: Rapidly Mitigating Bursty Traffic by Offloading Packet Processing , 2019, USENIX Annual Technical Conference.

[31]  Sergei Vassilvitskii,et al.  Nearest-neighbor caching for content-match applications , 2009, WWW '09.

[32]  Salvatore Orlando,et al.  Similarity caching in large-scale image retrieval , 2012, Inf. Process. Manag..

[33]  Minlan Yu,et al.  SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs , 2017, SIGCOMM.

[34]  Daniel Raumer,et al.  MoonGen: A Scriptable High-Speed Packet Generator , 2014, Internet Measurement Conference.

[35]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[36]  Alan Mislove,et al.  A large-scale analysis of deployed traffic differentiation practices , 2019, SIGCOMM.

[37]  Jean C. Walrand,et al.  Knowledge-Defined Networking: Modelització de la xarxa a través de l’aprenentatge automàtic i la inferència , 2016 .