Adaptive Histogram Algorithms for Approximating Frequency Queries in Dynamic Data Streams

Histograms can be used as summaries of frequency data. However, staying within the error tolerance becomes problematic when dealing with dynamic data streams. For dynamic data streams, the histograms can be reconstructed every time data is either discarded or collected which is very inefficient. If a histogram is to be employed as a quick estimate of stream data, updating the histogram non-destructively can be done using the following approach: decrement one from each bucket where data is to leave the histogram, and increment one to each bucket where data is to enter the histogram. In this paper, we empirically prove this method to be a generally strong way to control loss of accuracy. The costs of executing this error-minimizing layer are trivial to processing, memory, and should consequentially maximize uptime. This method was tested on two histogram algorithms including Equivalent Width and Variance Optimal in four specified histogram data-density scenarios including sparse, balanced, dense, and very dense, while using two different random value distribution sources including the Uniform distribution and Gaussian distribution.