Approximate Top-k Queries in Sensor Networks

We consider a distributed system where each node has a local count for each item (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose sum of counts, across all nodes in the system, is the largest. In this paper we present a Monte-Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node

[1]  Michael E. Saks,et al.  The cell probe complexity of dynamic data structures , 1989, STOC '89.

[2]  Per Bak,et al.  How Nature Works , 1996 .

[3]  Wei Hong,et al.  The design of an acquisitional query processor for sensor networks , 2003, SIGMOD '03.

[4]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[5]  Kamesh Munagala,et al.  A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Dimitrios Gunopulos,et al.  The threshold join algorithm for top-k queries in distributed sensor networks , 2005, DMSN '05.

[7]  Zhe Wang,et al.  Efficient top-K query calculation in distributed networks , 2004, PODC '04.

[8]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[9]  Mohammad Ilyas,et al.  Handbook of Sensor Networks: Compact Wireless and Wired Sensing Systems , 2004 .

[10]  Brett Warneke Miniaturizing Sensor Networks with MEMS , 2004, Handbook of Sensor Networks.

[11]  Helmut Prodinger,et al.  A result in order statistics related to probabilistic counting , 1993, Computing.

[12]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[13]  Sanjeev Khanna,et al.  Power-conserving computation of order-statistics over sensor networks , 2004, PODS.

[14]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[15]  Richard M. Karp,et al.  An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[16]  David Peleg,et al.  Distributed Computing: A Locality-Sensitive Approach , 1987 .

[17]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[18]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[19]  Ronald Fagin,et al.  Combining fuzzy information: an overview , 2002, SGMD.

[20]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.

[21]  Aravind Srinivasan,et al.  Fast randomized algorithms for distributed edge coloring , 1992, PODC '92.

[22]  Wolf-Tilo Balke,et al.  Progressive distributed top-k retrieval in peer-to-peer networks , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Srinivasan Seshan,et al.  Synopsis diffusion for robust aggregation in sensor networks , 2004, SenSys '04.

[25]  Andrew Chi-Chih Yao,et al.  Should Tables Be Sorted? , 1981, JACM.

[26]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[27]  Graham Cormode,et al.  Holistic aggregates in a networked world: distributed tracking of approximate quantiles , 2005, SIGMOD '05.

[28]  Philippe Flajolet,et al.  Loglog Counting of Large Cardinalities (Extended Abstract) , 2003, ESA.

[29]  Aravind Srinivasan,et al.  Fast Randomized Algorithms for Distributed Edge Coloring (Extended Abstract). , 1992, PODC 1992.

[30]  Boaz Patt-Shamir A note on efficient aggregate queries in sensor networks , 2004, PODC '04.

[31]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[32]  David M. Raup,et al.  How Nature Works: The Science of Self-Organized Criticality , 1997 .