We consider the problem of tracking with small relative error an integer function f(n) defined by a distributed update stream f'(n) in the distributed monitoring model. In this model, there are k sites over which the updates f'(n) are distributed, and they must communicate with a central coordinator to maintain an estimate of f(n). Existing streaming algorithms with worst-case guarantees for this problem assume f(n) to be monotone; there are very large lower bounds on the space requirements for summarizing a distributed non-monotonic stream, often linear in the size n of the stream. However, the input streams obtaining these lower bounds are highly variable, making relatively large jumps from one timestep to the next; in practice, the impact on f(n) of any single update f'(n) is usually small. What has heretofore been lacking is a framework for non-monotonic streams that admits algorithms whose worst-case performance is as good as existing algorithms for monotone streams and degrades gracefully for non-monotonic streams as those streams vary more quickly. In this paper we propose such a framework. We introduce a stream parameter, the "variability" v, deriving its definition in a way that shows it to be a natural parameter to consider for non-monotonic streams. It is also a useful parameter. From a theoretical perspective, we can adapt existing algorithms for monotone streams to work for non-monotonic streams, with only minor modifications, in such a way that they reduce to the monotone case when the stream happens to be monotone, and in such a way that we can refine the worst-case communication bounds from θ(n) to Õv. From a practical perspective, we demonstrate that v can be small in practice by proving that v is O(log f(n)) for monotone streams and o(n) for streams that are "nearly" monotone or that are generated by random walks. We expect v to be o(n) for many other interesting input classes as well.
[1]
A. Razborov.
Communication Complexity
,
2011
.
[2]
Zhenming Liu,et al.
Continuous distributed counting for non-monotonic streams
,
2012,
PODS '12.
[3]
David P. Woodruff,et al.
Tight bounds for distributed functional monitoring
,
2011,
STOC '12.
[4]
Jian Pei,et al.
Logging every footstep: quantile summaries for the entire history
,
2010,
SIGMOD Conference.
[5]
Kai-Min Chung,et al.
Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified
,
2012,
STACS.
[6]
Chrisil Arackaparambil,et al.
Functional Monitoring without Monotonicity
,
2009,
ICALP.
[7]
Graham Cormode,et al.
An improved data stream summary: the count-min sketch and its applications
,
2004,
J. Algorithms.
[8]
Qin Zhang,et al.
Optimal Tracking of Distributed Heavy Hitters and Quantiles
,
2011,
Algorithmica.
[9]
Qin Zhang,et al.
Randomized algorithms for tracking distributed count, frequencies, and ranks
,
2012,
PODS '12.
[10]
Sumit Ganguly,et al.
CR-precis: A Deterministic Summary Structure for Update Data Streams
,
2006,
ESCAPE.
[11]
S. Muthukrishnan,et al.
Data streams: algorithms and applications
,
2005,
SODA '03.
[12]
Graham Cormode,et al.
Algorithms for distributed functional monitoring
,
2008,
SODA '08.