RASP: Real-time network analytics with distributed NoSQL stream processing

In this paper we present RASP, a system that combines latest distributed stream processing and NoSQL engines to enable the real-time low latency storage and joining of incoming data streams with external datasets of arbitrary sizes through an extensible, SQL compliant manner. We achieve low latency, real time execution by employing the Kafka and Storm frameworks to join incoming tuples as they arrive, while the denormalized result is being stored in HBase, a distributed NoSQL engine with the use of Phoenix, a framework that fully supports SQL. We fine-tune the topology execution to achieve maximum performance and we also apply a set of optimizations both in the HBase storage and the Phoenix SQL execution framework. We use RASP to solve a network analytics problem using real data. RASP performs its computations utilizing an extensible pipeline of Storm bolts that incrementally augment incoming tuples with the execution of different algorithms. We deploy our system over an IaaS cloud and we evaluate its performance for various workloads, cluster sizes and configurations, where we show that in some cases RASP achieves a throughput increase of more than 140% and a latency drop of more than 65% compared to a vanilla setting.