Histograms can be used as summaries of frequency data. However, staying within the error tolerance becomes problematic when dealing with dynamic data streams. For dynamic data streams, the histograms can be reconstructed every time data is either discarded or collected which is very inefficient. If a histogram is to be employed as a quick estimate of stream data, updating the histogram non-destructively can be done using the following approach: decrement one from each bucket where data is to leave the histogram, and increment one to each bucket where data is to enter the histogram. In this paper, we empirically prove this method to be a generally strong way to control loss of accuracy. The costs of executing this error-minimizing layer are trivial to processing, memory, and should consequentially maximize uptime. This method was tested on two histogram algorithms including Equivalent Width and Variance Optimal in four specified histogram data-density scenarios including sparse, balanced, dense, and very dense, while using two different random value distribution sources including the Uniform distribution and Gaussian distribution.
[1]
Yannis E. Ioannidis,et al.
Balancing histogram optimality and practicality for query result size estimation
,
1995,
SIGMOD '95.
[2]
Torsten Suel,et al.
Optimal Histograms with Quality Guarantees
,
1998,
VLDB.
[3]
Yannis E. Ioannidis,et al.
Histogram-Based Approximation of Set-Valued Query-Answers
,
1999,
VLDB.
[4]
Piotr Indyk,et al.
Maintaining stream statistics over sliding windows: (extended abstract)
,
2002,
SODA '02.
[5]
Yannis E. Ioannidis,et al.
The History of Histograms (abridged)
,
2003,
VLDB.
[6]
Gurmeet Singh Manku,et al.
Approximate counts and quantiles over sliding windows
,
2004,
PODS.
[7]
Approximate Frequency Counts over Data Streams
,
2012,
Proc. VLDB Endow..