Optimal Gossip Algorithms for Exact and Approximate Quantile Computations

This paper gives drastically faster gossip algorithms to compute exact and approximate quantiles. Gossip algorithms, which allow each node to contact a uniformly random other node in each round, have been intensely studied and been adopted in many applications due to their fast convergence and their robustness to failures. Kempe et al. [24] gave gossip algorithms to compute important aggregate statistics if every node is given a value. In particular, they gave a beautiful O(logn + log 1 ε ) round algorithm to ε-approximate the sum of all values and an O(log2 n) round algorithm to compute the exact Φ-quantile, i.e., the ?Φn? smallest value. We give an quadratically faster and in fact optimal gossip algorithm for the exact Φ-quantile problem which runs in O(logn) rounds. We furthermore show that one can achieve an exponential speedup if one allows for an ε-approximation. In particular, we give an O(log logn + log 1 ε ) round gossip algorithm which computes a value of rank between Φn and (Φ + ε)n at every node. Our algorithms are extremely simple and very robust - they can be operated with the same running times even if every transmission fails with a, potentially different, constant probability. We also give a matching Ω(log logn + log 1 ε ) lower bound which shows that our algorithm is optimal for all values of ε.

[1]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[2]  Bernhard Haeupler,et al.  Breathe before speaking: efficient information dissemination despite noisy, limited and anonymous communication , 2014, PODC '14.

[3]  Gopal Pandurangan,et al.  Almost-Optimal Gossip-Based Aggregate Computation , 2012, SIAM J. Comput..

[4]  Graham Cormode,et al.  Mergeable summaries , 2012, PODS '12.

[5]  Bernhard Haeupler,et al.  Analyzing Network Coding (Gossip) Made Easy , 2010, J. ACM.

[6]  Nicola Santoro,et al.  Reduction Techniques for Selection in Distributed Files , 1989, IEEE Trans. Computers.

[7]  Thomas Sauerwald,et al.  On the runtime and robustness of randomized broadcasting , 2006, Theor. Comput. Sci..

[8]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[9]  Hing-Fung Ting,et al.  An Ω(1/ε log 1/ε) space lower bound for finding ε-approximate quantiles in a data stream , 2010 .

[10]  Rafail Ostrovsky,et al.  A Randomized Online Quantile Summary in O(1/epsilon * log(1/epsilon)) Words , 2015, APPROX-RANDOM.

[11]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[12]  Sanjeev Khanna,et al.  Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.

[13]  Ronald L. Rivest,et al.  Expected time bounds for selection , 1975, Commun. ACM.

[14]  Nicola Santoro,et al.  On the Expected Complexity of Distributed Selection , 1988, J. Parallel Distributed Comput..

[15]  Sanjay Ranka,et al.  A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data , 1997, VLDB.

[16]  Nicola Santoro,et al.  Efficient Distributed Selection with Bounded Messages , 1997, IEEE Trans. Parallel Distributed Syst..

[17]  Rajeev Rastogi,et al.  Efficient gossip-based aggregate computation , 2006, PODS.

[18]  Roger Wattenhofer,et al.  Tight bounds for distributed selection , 2007, SPAA '07.

[19]  Qiang Ma,et al.  Frugal Streaming for Estimating Quantiles , 2013, Space-Efficient Data Structures, Streams, and Algorithms.

[20]  Boaz Patt-Shamir A note on efficient aggregate queries in sensor networks , 2004, PODC '04.

[21]  Edo Liberty,et al.  Optimal Quantile Approximation in Streams , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Francis Y. L. Chin,et al.  An improved algorithm for finding the median distributively , 2005, Algorithmica.

[23]  Michael Rodeh,et al.  Distributed k-selection: From a sequential to a distributed algorithm , 1983, PODC '83.

[24]  Alan M. Frieze,et al.  The shortest-path problem for graphs with random arc-lengths , 1985, Discret. Appl. Math..

[25]  Desh Ranjan,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[26]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[27]  Richard M. Karp,et al.  Randomized rumor spreading , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[28]  Michael Rodeh,et al.  Finding the Median Distributively , 1982, J. Comput. Syst. Sci..

[29]  Nicola Santoro,et al.  A Distributed Selection Algorithm and its Expected Communication Complexity , 1992, Theor. Comput. Sci..

[30]  Divyakant Agrawal,et al.  Medians and beyond: new aggregation techniques for sensor networks , 2004, SenSys '04.

[31]  B. Pittel On spreading a rumor , 1987 .

[32]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[33]  Christian Scheideler,et al.  Stabilizing consensus with the power of two choices , 2011, SPAA '11.

[34]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[35]  Greg N. Frederickson,et al.  Tradeoffs for selection in distributed networks (Preliminary Version) , 1983, PODC '83.

[36]  Sanjeev Khanna,et al.  Power-conserving computation of order-statistics over sensor networks , 2004, PODS.

[37]  C. A. R. Hoare Algorithm 63: partition , 1961, CACM.

[38]  Sudipto Guha,et al.  Stream Order and Order Statistics: Quantile Estimation in Random-Order Streams , 2009, SIAM J. Comput..

[39]  Nicola Santoro,et al.  Shout echo selection in distributed files , 1986, Networks.

[40]  Rafail Ostrovsky,et al.  A Randomized Online Quantile Summary in O((1/ε) log(1/ε)) Words , 2017, Theory Comput..

[41]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.