Smooth Histograms for Sliding Windows

In the streaming model elements arrive sequentially and can be observed only once. Maintaining statistics and aggregates is an important and non-trivial task in the model. This becomes even more challenging in the sliding windows model, where statistics must be maintained only over the most recent n elements. In their pioneering paper, Datar, Gionis, Indyk and Motwani [15] presented exponential histograms, an effective method for estimating statistics on sliding windows. In this paper we present a new smooth histograms method that improves the approximation error rate obtained via exponential histograms. Furthermore, our smooth histograms method not only captures and improves multiple previous results on sliding windows bur also extends the class functions that can be approximated on sliding windows. In particular, we provide the first approximation algorithms for the following functions: Lp norms for p notin [1,2], frequency moments, length of increasing subsequence and geometric mean.

[1]  David P. Woodruff,et al.  The communication and streaming complexity of computing the longest common and increasing subsequences , 2007, SODA '07.

[2]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[3]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[4]  Erik Vee,et al.  Finding longest increasing and common subsequences in streaming data , 2005, J. Comb. Optim..

[5]  Lap-Kei Lee,et al.  Maintaining significant stream statistics over sliding windows , 2006, SODA '06.

[6]  Joan Feigenbaum,et al.  Computing Diameter in the Streaming and Sliding-Window Models , 2002, Algorithmica.

[7]  Ravi Kumar,et al.  An improved data stream algorithm for frequency moments , 2004, SODA '04.

[8]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[9]  Sumit Ganguly,et al.  Simpler algorithm for estimating frequency moments of data streams , 2006, SODA '06.

[10]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[11]  Rafail Ostrovsky,et al.  Succinct Sampling on Streams , 2007, ArXiv.

[12]  Hao Yuan,et al.  Longest increasing subsequences in windows based on canonical antichain partition , 2007, Theor. Comput. Sci..

[13]  Ravi Kumar,et al.  An information statistics approach to data stream and communication complexity , 2004, J. Comput. Syst. Sci..

[14]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[15]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[16]  Robert Krauthgamer,et al.  Estimating the sortedness of a data stream , 2007, SODA '07.

[17]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[18]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[19]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[20]  Timothy M. Chan,et al.  Geometric Optimization Problems over Sliding Windows , 2006, Int. J. Comput. Geom. Appl..

[21]  S. Muthukrishnan,et al.  Estimating Rarity and Similarity over Data Stream Windows , 2002, ESA.

[22]  Dan Gusfield,et al.  Algorithms on strings , 1997 .

[23]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2004, Theory of Computing Systems.

[24]  Sumit Ganguly,et al.  Estimating Frequency Moments of Data Streams Using Random Linear Combinations , 2004, APPROX-RANDOM.