Plentiful emerging applications need strict requirement on query response time for different operators over distributed streaming data. As a result, approximate answering approach with accurate sketch has become an important solution to process the fast arrival streams. In this paper, we propose a dynamic sketching framework, which can sample elements from streams with out-of-order data arrival and provide an error-guaranteed estimation schema for many different operators. Within the sketch, we first extract characteristics of uniform sampling and exponential sampling from one-pass streaming data and organize them to support (ξ, δ)-approximation for different operators, such as aggregation operators (e.g., sum, count) and quantile operators (e.g., quantiles, median). Moreover, we construct the sketch in an accuracy lossless and dynamic manner by such operations as sketch splitting and sketch merging without any pori knowledge. The experimental results indicate that when compared to big data analytic systems (Spark, BlinkDB), our approach can achieve 3 times of throughput improvement and 2 orders of magnitude improvement in query response time.
[1]
Odysseas Papapetrou,et al.
Sketch-based Querying of Distributed Sliding-Window Data Streams
,
2012,
Proc. VLDB Endow..
[2]
Keqin Li,et al.
FastRAQ: A Fast Approach to Range-Aggregate Queries in Big Data Environments
,
2015,
IEEE Transactions on Cloud Computing.
[3]
H. Stanley,et al.
Quantifying Trading Behavior in Financial Markets Using Google Trends
,
2013,
Scientific Reports.
[4]
Ion Stoica,et al.
G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data
,
2015,
SIGMOD Conference.
[5]
Ion Stoica,et al.
BlinkDB: queries with bounded errors and bounded response times on very large data
,
2012,
EuroSys '13.
[6]
Srikanta Tirthapura,et al.
Sketching asynchronous streams over a sliding window
,
2006,
PODC '06.
[7]
Carlo Zaniolo,et al.
Fast computation of approximate biased histograms on sliding windows over data streams
,
2013,
SSDBM.