On the Distributed Complexity of Computing Maximal Matchings

One of the fascinating questions of computer science is whether and to what extent randomization increases the power of algorithmic procedures. It is well-known that, in general, randomization makes distributed algorithms more powerful, for there are examples of basic coordination tasks in asynchronous systems which cannot be solved by deterministic procedures but admit simple randomized solutions. Randomization is also demonstrably more powerful in synchronous systems, as shown by the important example of oblivious routing in the hypercube (see, for instance, [13, 18]). In this paper we are interested in this question in the context of distributed graph algorithms, where a synchronous, message-passing network without shared memory is to compute a function of its own topology, and focus on the problem of computing maximal matchings. We show that maximal matchings can be computed in polylogarithmically-many communication rounds by deterministic distributed algorithms. So, as far as maximal matchings are concerned, randomization is not necessary to go over the sub-linear \divide". To put our work into perspective we review some of the relevant facts and literature. In a distributed network or architecture without shared memory the cost of sending a message between two nodes is proportional to their distance in the network. Since sending messages to far-away nodes is expensive, it is desirable that computation be based only on information available locally. This locality constraint can be quite severe when one is to compute a global function of input data which are spread across the network and represents a challenge from the point of view of algorithmic design. This communication problem is completely neglected in the popular pram model. There, the existence of a shared memory which can be accessed in unit time allows fast collection and dissemination of data among the processors. Once this assumption is removed and the cost of communication is taken into consideration, several computational problems which were easily solvable suddenly become hard or unsolvable eciently, especially if one is seeking deterministic solutions. The study of distributed graph algorithms goes back to (at least) the work of Linial [14] where an (log n) lower bound for computing maximal independent sets (MIS's) in the ring is given. Together with the O(log n) upper bound given by a beautiful algorithm of Cole and Vishkin, this is one of the all too rare examples in complexity theory where the complexity of a computational problem can be determined exactly (modulo constants). Interestingly, it can be shown that randomization does not help [19]. Generalizing from rings to bounded degree graphs one sees that several classical graph structures of both theoretical and practical interest, including MIS's, maximal matchings, ( + 1)- and even -vertex colorings, can be computed in polylogarithmic time [1, 2, 7, 23]. In fact, many of these algorithms are very satisfactory because they are both quite simple and really of low complexity, i.e. with small exponents and no hidden large constants. Further generalizing from bounded degree graphs to general topologies has proven elusive, in spite of several eorts [1, 2, 15, 20, 23, 24]. The situation here is, more or less, as follows. For a reasonably large class of graph structures, the asymptotically best deterministic algorithm known to date uses O(n(n)) rounds, where (n) is a function which (very slowly) goes to 0 as n, the size of the network, grows. These solutions are mainly of theoretical interest, since the protocols are quite cumbersome and their implementation would probably be prohibitively expensive. On the other hand, once randomization is allowed, the same graph structures can be computed in O(polylog(n)) rounds. Furthermore, these randomized algorithms are usually extremely simple and their actual complexity is very low. For instance, ( + 1){vertex coloring and MIS can be computed in O(log n) rounds with high probability by exceedingly simple protocols [16, 17, 21]. Another important case is that of (O(log n);O(log n)){ decompositions, a very interesting type of graph decomposition with many applications, which can be computed in O(log2 n) rounds [15]. In fact, there exist non-trivial functions, such as nearly optimal edge colourings, that can be computed, with high probability, by extremely simple, indeed trivial, randomized algorithms in o(log n) (little-oh of n) rounds or even, under suitable degree assumptions, in as few as O(log log n) rounds [8]. The question then is whether, in the context of distributed graph algorithms, randomization is necessary in order to obtain protocols which run in polylogarithmically-many rounds in the size of the network. In an attempt to gain some insight into this problem, we show that for a non-trivial and important graph structure, maximal matchings, randomization is not needed. Matchings are important structures from a theoretical point of view but might also be of practical interest, since, in some situations, they correspond to a set of operations, say, data transfers, that can be performed simultaneously without mutual interferences. We note that maximal matching is a special case of the dicult open problem of determining whether MIS's can be quickly computed deterministically in spite of the locality constraint. The complexity of the protocol presented in this paper is quite high{ O(log7 n) rounds{ but it should be remembered that even in the erew-pram model the best asymptotic complexity for computing maximal matchings is O(log4 n) [9].1 Our solution hinges on a distributed procedure which, for almost all vertices in the graph, cuts the degree of a vertex almost perfectly in half. This approximate degree splitter might be useful in other contexts. To our knowledge, maximal matchings are one of the very few examples of non-trivial graph functions which can be computed deterministically in polylogarithmically-many communication rounds in the distributed model, without additional assumption on the input network. Other notable exceptions are the so-called ruling forests of [1] and the k-dominating sets of [12] both of which, however, are not \classical" graph structures. We end this section by spelling out our model of computation, the synchronous, message-passing distributed network. Here, a distributed network (or architecture) is modelled as an undirected graph. The vertices of the graph correspond to processors and edges correspond to bi-directional communication links. The network is synchronous in the sense that computation takes place in a sequence of rounds; in each round, each processor reads messages sent to it by its neighbours in the graph, does any amount of local computation, and sends messages back to each of its neighbours. The time complexity of a distributed algorithm is then given by the number of rounds needed to compute the desired function. Each node of the network has a unique identier (id) and knows it. We assume that the id's of the network are the integers from 1 to n. The problem we study is this: A distributed network is to compute a maximal matching of its own (unknown) topology.