Wormhole: a novel big data platform for 100 Gbit/s network monitoring and beyond

Internet measurement and analysis is increasingly challenging as the Internet evolves, primarily due to changing-trends, speed increments or new protocols and ciphers. As such, ad-hoc monitoring equipment comes in handy, albeit cost-effectiveness impedes deployment at a very large scale. As an alternative, big data-based distributed architectures are being proposed for network monitoring and analysis. However, in light of the high throughput currently offered by 100 Gbit/s links, it turns out that state-of-the-art big data solutions fall short of capacity, unless a huge amount of computers are used. In order to effectively tackle that issue, we have created Wormhole: a streaming engine that circumvents existing limitations by distributing the input messages/packets coherently among different off-the-shelf analysis equipment, thus reducing costs and equipment. Should the incoming data rate be larger than the system throughput, a distributed file system can be used for temporary data storage, for subsequent filtering and in-depth analysis. The proposed solution provides on-line real-time monitoring metrics with the ability to gain further insights when required. The prototyped architecture is able to deal with 100 Gbit/s networks and can be easily scaled up to higher rates by just adding more computing nodes and/or by trimming encrypted packet payloads.

[1]  Jugal K. Kalita,et al.  Information metrics for low-rate DDoS attack detection: A comparative evaluation , 2014, 2014 Seventh International Conference on Contemporary Computing (IC3).

[2]  Zhuo Liu,et al.  Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[3]  T. V. Lakshman,et al.  Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework , 2017, CoNEXT.

[4]  Inder Monga,et al.  Lambda architecture for cost-effective batch and speed big data processing , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[5]  KyoungSoo Park,et al.  Scalable TCP Session Monitoring with Symmetric Receive-side Scaling , 2012 .

[6]  Jason Lee,et al.  Prototyping a 100G Monitoring System , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[7]  Sebastian Gallenmüller,et al.  FlowScope: Efficient packet capture and storage in 100 Gbit/s networks , 2017, 2017 IFIP Networking Conference (IFIP Networking) and Workshops.

[8]  Martin May,et al.  Impact of packet sampling on anomaly detection metrics , 2006, IMC '06.

[9]  Victor W. Marek,et al.  Scalable hybrid stream and hadoop network analysis system , 2014, ICPE.

[10]  Youngseok Lee,et al.  Toward scalable internet traffic measurement and analysis with Hadoop , 2013, CCRV.

[11]  Luca Deri nCap: wire-speed packet capture and transmission , 2005, Workshop on End-to-End Monitoring Techniques and Services, 2005..

[12]  Cong Xu,et al.  JVM-Bypass for Efficient Hadoop Shuffling , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[13]  Dario Rossi,et al.  On the impact of sampling on traffic monitoring and analysis , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[14]  Nikolas Ioannou,et al.  Crail: A High-Performance I/O Architecture for Distributed Data Processing , 2017, IEEE Data Eng. Bull..

[15]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[16]  Gustavo Sutter,et al.  FPGA-based encrypted network traffic identification at 100 Gbit/s , 2016, 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig).