Foundations of aggregation and synchronization in distributed systems

A distributed system consists of several autonomous devices that are capable of performing certain (computational) tasks and that have a means to communicate with each other. A computer network system, such as the Internet, is a prototypical example of a distributed system. While a distributed system has many advantages over a single computational unit, e.g., the combined computational power of all entities of a distributed system typically exceeds the power of any single computational device considerably, the decentralized nature of distributed systems also poses significant challenges. In this thesis, two fundamental problems of distributed systems are studied. The first part of this thesis focuses on the problem of computing global functions that depend on the state of all devices in the system. Since each device stores only a small part of the state of the entire system, interaction between the devices is required in order compute such functions. If the bandwidth of the communication channels is bounded, it may not be an efficient solution to simply encode the state of each entity and forward this information to a single participant in the system, which could then compute the result of the function locally. Instead, the devices may attempt to aggregate the data received from other devices in the system and use this information to compute partial solutions of the global function. Such aggregation techniques may greatly reduce the bandwidth consumption when computing global functions in a distributed manner. The goal is to gain a deeper understanding of the complexity of computing global functions using in-network aggregation. In the second part of this thesis, we consider the problem that several distributed applications and protocols require that all computational devices maintain a common notion of time, but the devices do not have access to a global timer. If each device possesses its own clock, the different clock rates of these clocks necessitate the use of a clock synchronization algorithm whose purpose is to compensate for the clock drifts by exchanging timing information and adjusting the clock values according to the received information. Synchronizing clocks is a challenging task mainly due to the uncontrollable and potentially varying message delays, which render it impossible for the devices to determine how accurate the timing information is that they receive from other devices. The objective is thus to analyze the feasible degree of synchronization, which not only depends on the message delays and the clock drift rates, but also on other parameters such as the frequency of communication.

[1]  Srikanta Tirthapura,et al.  Estimating simple functions on the union of data streams , 2001, SPAA '01.

[2]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[3]  Danny Dolev,et al.  Dynamic fault-tolerant clock synchronization , 1995, JACM.

[4]  Stefan Schmid,et al.  Distributed computation of the mode , 2008, PODC '08.

[5]  Nicola Santoro,et al.  Shout echo selection in distributed files , 1986, Networks.

[6]  Subhash Khot,et al.  Near-optimal lower bounds on the multi-party communication complexity of set disjointness , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[7]  Nicola Santoro,et al.  Reduction Techniques for Selection in Distributed Files , 1989, IEEE Trans. Computers.

[8]  T. Lindvall ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.

[9]  Michael Rodeh,et al.  Distributed k-selection: From a sequential to a distributed algorithm , 1983, PODC '83.

[10]  Nicola Santoro,et al.  Efficient Distributed Selection with Bounded Messages , 1997, IEEE Trans. Parallel Distributed Syst..

[11]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[12]  Jennifer L. Welch,et al.  Closed form bounds for clock synchronization under simple uncertainty assumptions , 2001, Inf. Process. Lett..

[13]  Christoph Lenzen,et al.  Clock Synchronization with Bounded Global and Local Skew , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Boaz Patt-Shamir,et al.  Optimal and efficient clock synchronization under drifting clocks , 1999, PODC '99.

[15]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[16]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .

[17]  Sam Toueg,et al.  Optimal clock synchronization , 1985, PODC '85.

[18]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[19]  Hagit Attiya,et al.  Optimal clock synchronization under different delay assumptions , 1993, PODC '93.

[20]  Roger Wattenhofer,et al.  Oblivious Gradient Clock Synchronization , 2006, DISC.

[21]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[22]  Greg N. Frederickson,et al.  Tradeoffs for selection in distributed networks (Preliminary Version) , 1983, PODC '83.

[23]  Nicola Santoro,et al.  On the Expected Complexity of Distributed Selection , 1988, J. Parallel Distributed Comput..

[24]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.

[25]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[26]  Yoram Moses,et al.  Knowledge, Timed Precedence and Clocks , 1995, PODC 1995.

[27]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[28]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[29]  Andrew Chi-Chih Yao,et al.  Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[30]  Arnold Schönhage,et al.  Finding the Median , 1976, J. Comput. Syst. Sci..

[31]  Boaz Patt,et al.  A theory of clock synchronization , 1994 .

[32]  Roger Wattenhofer,et al.  Received-signal-strength-based logical positioning resilient to signal fluctuation , 2005, Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Network.

[33]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[34]  Roger Wattenhofer,et al.  Sensor Networks Continue to Puzzle: Selected Open Problems , 2008, ICDCN.

[35]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[36]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[37]  Danny Dolev,et al.  On the possibility and impossibility of achieving clock synchronization , 1984, STOC '84.

[38]  Roger Wattenhofer,et al.  Rescuing Tit-for-Tat with Source Coding , 2007 .

[39]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[40]  Jeffrey Considine,et al.  Approximate aggregation techniques for sensor databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[41]  Deborah Estrin,et al.  Computing aggregates for monitoring wireless sensor networks , 2003, Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications, 2003..

[42]  Nicola Santoro,et al.  Order statistics on distributed sets , 1982 .

[43]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[44]  Nicola Santoro,et al.  A Distributed Selection Algorithm and its Expected Communication Complexity , 1992, Theor. Comput. Sci..

[45]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[46]  Nancy A. Lynch,et al.  An Upper and Lower Bound for Clock Synchronization , 1984, Inf. Control..

[47]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[48]  Joseph Y. Halpern,et al.  Optimal precision in the presence of uncertainty , 1985, J. Complex..

[49]  Stefan Schmid,et al.  Free Riding in BitTorrent is Cheap , 2006, HotNets.

[50]  Christoph Lenzen,et al.  Tight bounds for clock synchronization , 2010, JACM.

[51]  Stefan Schmid,et al.  eQuus: A Provably Robust and Locality-Aware Peer-to-Peer System , 2006, Sixth IEEE International Conference on Peer-to-Peer Computing (P2P'06).

[52]  Stefan Schmid,et al.  Push-to-Pull Peer-to-Peer Live Streaming , 2007, DISC.

[53]  Michael Rodeh,et al.  Finding the Median Distributively , 1982, J. Comput. Syst. Sci..

[54]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[55]  Francis Y. L. Chin,et al.  An improved algorithm for finding the median distributively , 2005, Algorithmica.

[56]  Roger Wattenhofer,et al.  Distributed selection: a missing piece of data aggregation , 2008, CACM.

[57]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[58]  Roger Wattenhofer,et al.  Tight bounds for distributed selection , 2007, SPAA '07.

[59]  Boaz Patt-Shamir A note on efficient aggregate queries in sensor networks , 2004, PODC '04.

[60]  Sumit Ganguly,et al.  Simpler algorithm for estimating frequency moments of data streams , 2006, SODA '06.

[61]  Fabian Kuhn,et al.  Gradient Clock Synchronization in Dynamic Networks , 2009, SPAA '09.

[62]  Srikanta Tirthapura,et al.  Range Efficient Computation of F0 over Massive Data Streams , 2005, ICDE.

[63]  Nancy A. Lynch,et al.  Gradient clock synchronization , 2004, PODC '04.

[64]  Stefan Schmid,et al.  Rescuing Tit-for-Tat with Source Coding , 2007, Seventh IEEE International Conference on Peer-to-Peer Computing (P2P 2007).

[65]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[66]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[67]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[68]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.