Dynamic sketching over distributed data streams

Plentiful emerging applications need strict requirement on query response time for different operators over distributed streaming data. As a result, approximate answering approach with accurate sketch has become an important solution to process the fast arrival streams. In this paper, we propose a dynamic sketching framework, which can sample elements from streams with out-of-order data arrival and provide an error-guaranteed estimation schema for many different operators. Within the sketch, we first extract characteristics of uniform sampling and exponential sampling from one-pass streaming data and organize them to support (ξ, δ)-approximation for different operators, such as aggregation operators (e.g., sum, count) and quantile operators (e.g., quantiles, median). Moreover, we construct the sketch in an accuracy lossless and dynamic manner by such operations as sketch splitting and sketch merging without any pori knowledge. The experimental results indicate that when compared to big data analytic systems (Spark, BlinkDB), our approach can achieve 3 times of throughput improvement and 2 orders of magnitude improvement in query response time.