论文信息 - Distributed Counting: How to Bypass Bottlenecks

Distributed Counting: How to Bypass Bottlenecks

A distributed counter is a variable that is common to all processors in the system and that supports an atomic test-and-increment operation: The operation delivers the system's counter value to the requesting processor and increments it. In this work we examine di erent aspects of distributed counting, with the emphasis on e ciency. A naive distributed counter stores the system's counter value with a distinguished central processor. When other processors initiate the testand-increment operation, they send a request message to the central processor and in turn receive a reply message with the current counter value. However, with a large number of processors operating on the distributed counter, the central processor will become a bottleneck. There will be a congestion of request messages at the central processor. The primary goal of this work is to implement an e cient distributed counter. Since the e ciency of a distributed counter depends on the absence of a bottleneck, any reasonable model of e ciency must comprise the essence of bottlenecks. In one approach, we minimize the number of messages which a \busiest" processor handles during a series of test-andincrement operations. We show a nontrivial lower bound and present a distributed counter that achieves this lower bound in a \sequential" setting. Since distributed counting has a multiplicity of applications (allocation of resources, synchronization of processes, mutual exclusion, or as a base for distributed data structures), the lower bound tells about the minimum coordination overhead for various distributed tasks. In the main part of the work we present the three most important proposals for implementing an e cient distributed counter: The family of Counting Networks by Aspnes, Herlihy, and Shavit; Counting Networks are distributed counters with an elegant decentralized structure, that had and will have enormous in uence on research. Contrary to Counting Networks, the Di racting Tree proposal by Shavit and Zemach demands iv Abstract that processors have a notion of time. Finally we present the Combining Tree. Its basic idea, combining several requests into one meta request, is well-known since 1983 (Gottlieb, Lubachevski, and Rudolph). For the rst time, we study a completely revised tree concept with systematic combining and other improvements. A randomized derivative of the Combining Tree, the Counting Pyramid, promises further advantages in practice. All three schemes are checked for correctness and other characteristics. We analyze the expected time for a test-and-increment operation. In order to take into account the bottleneck, we propose that no processor can handle an unlimited number of messages in limited time. We show that all three schemes are considerably more e cient than the central scheme. Moreover, we give evidence that the Combining Tree is an asymptotically optimum distributed counting scheme. There are various other characteristics beyond pure speed which are desirable for a distributed counter. In particular, we examine stronger correctness conditions (e.g. linearizability). We want a counting scheme to provide more powerful operations than the test-and-increment only. And we wish that a counting scheme adapts instantaneously to changing conditions in the system. Since these points are satis ed by the Counting Pyramid, it is a promising distributed counter for various practical applications. We further discuss model-speci c issues such as fault tolerance and notion of time. Our theoretical analyses are supported by simulations. We present various case studies along with their interpretation. Additionally we present many important applications of distributed counting (e.g. distributed data structures such as stack or queue, and dynamic load balancing). Kurzfassung Ein Zahler in einem System verteilter Prozessoren ist eine Variable, die jedem Prozessor einen atomaren test-and-increment-Zugri erlaubt: Der aktuelle Zahlerwert wird dem anfragenden Prozessor mitgeteilt, und der Zahlerwert des Systems wird um eins erhoht. In dieser Arbeit untersuchen wir verschiedene Aspekte des verteilten Zahlens, wobei auf E zienzbetrachtungen besonderer Wert gelegt wird. Ein naiver Ansatz f ur einen verteilten Zahler speichert den Zahlerwert des Systems bei einem zentralen Prozessor. Greifen andere Prozessoren mittels test-and-increment-Operation auf den Zahlerwert zu, senden sie dem zentralen Prozessor eine Anfragenachricht und erhalten postwendend eine Antwortnachricht mit dem aktuellen Zahlerwert. Ist die Anzahl der Prozessoren im System gross, wird der zentrale Prozessor uberlastet. Zu viele Anfragenachrichten tre en in zu kurzer Zeit beim zentralen Prozessor ein. Die Anfragen konnen nicht mehr postwendend beantwortet werden { sie stauen sich beim zentralen Prozessor. Der zentrale Prozessor wird zum Nadel ohr des verteilten Systems. Das primare Ziel dieser Arbeit ist, einen e zienten verteilten Zahler zu realisieren. Da die E zienz eines verteilten Zahlers von der Absenz eines Nadelohrs abh angt, muss die Nadelohr-Problematik in jedes vern unftige E zienz-Modell eingehen. In einem ersten Ansatz minimieren wir die Anzahl der Nachrichten, die der \zentralste" Prozessor wahrend einer Serie von test-and-increment-Zugri en verarbeitet. Wir zeigen eine nichttriviale untere Schranke und stellen einen verteilten Zahler vor, der diese untere Schranke in einer sequentiellen Umgebung erreicht. Da verteiltes Zahlen eine Vielzahl von Anwendungen hat (Allokation von Ressourcen, Synchronisation von Prozessen, gegenseitiger Ausschluss oder als Basis f ur verteilte Datenstrukturen), gibt die untere Schranke Aufschluss uber den minimalen Koordinationsaufwand vieler verteilter Probleme. vi Kurzfassung Im Hauptteil der Arbeit pr asentieren wir die drei wichtigsten Vorschlage zur e zienten Realisierung verteilter Zahler: Die Klasse der Counting Networks von Aspnes, Herlihy und Shavit. Counting Networks sind verteilte Zahler mit eleganter dezentraler Struktur, die einen grossen Ein uss auf die Forschung hatten und haben werden. Im Gegensatz zu Counting Networks verlangt der Di racting Tree von Shavit und Zemach von den Prozessoren einen Zeitbegri . Schliesslich stellen wir den Combining Tree vor. Dessen Schl usselidee, mehrere Anfragen zu einer Meta-Anfrage zu kombinieren, ist schon seit 1983 (Gottlieb, Lubachevski und Rudolph) bekannt und wird von uns komplett uberarbeitet und signi kant verbessert. Eine randomisierte Verwandte des Combining Tree, die Counting Pyramid, verspricht weitere Vorteile in der Anwendung. Alle drei Schemata werden auf Korrektheit und andere Eigenschaften gepr uft. Wir analysieren die erwartete Zeit, die ein test-and-incrementZugri kostet. Um der Nadelohr-Problematik Rechnung zu tragen, verlangen wir, dass kein Prozessor eine unbegrenzte Anzahl Nachrichten in begrenzter Zeit verarbeiten kann. Wir zeigen, dass alle drei Schemata wesentlich e zienter als der zentrale Zahler sind. Des weiteren besprechen wir, weshalb der Combining Tree der asymptotisch bestmogliche verteilte Zahler ist. Neben purer Geschwindigkeit w unschen wir uns diverse weitere Eigenschaften von einem verteilten Zahler. Wir untersuchen insbesondere versch arfte Korrektheitsbedingungen (Stichwort linearizability), die Flexibilitat, aufwendigere Operationen als test-and-increment anzubieten und die Anpassungsfahigkeit an sich andernde Umstande im System. Diese Eigenschaften sind die St arke der Counting Pyramid, weshalb sie ein vielversprechender verteilter Zahler f ur den praktischen Einsatz ist. Uberdies besprechen wir modellspezi sche Punkte wie Fehlertoleranz oder den Zeitbegri . Unsere theoretischen Analysen werden durch Ergebnisse einer Simulation gest utzt. Wir pr asentieren verschiedene Fallstudien und deren Interpretationen. Ausserdem stellen wir einige wichtige Anwendungen des verteilten Zahlens (verteilte Datenstrukturen wie Stack, Queue oder dynamische Lastverteilungsverfahren) vor. Chapter

Roger Wattenhofer | Roger Wattenhofer

[1] Nir Shavit,et al. Elimination Trees and the Construction of Pools and Stacks , 1997, Theory of Computing Systems.

[2] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[3] J. Banks,et al. Discrete-Event System Simulation , 1995 .

[4] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.

[5] Larry Rudolph,et al. Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[6] Jack J. Dongarra,et al. Message-Passing Performance of Various Computers , 1997, Concurr. Pract. Exp..

[7] Nir Shavit,et al. Diffracting trees , 1996, TOCS.

[8] Dimitri P. Bertsekas,et al. Data Networks , 1986 .

[9] G. Gonnet,et al. On Lambert's W Function , 1993 .

[10] Allan Gottlieb,et al. Operating system data structures for shared memory mimd machines with fetch-and-add , 1988 .

[11] Maurice Herlihy,et al. Contention in shared memory algorithms , 1993, JACM.

[12] R. Syski,et al. Fundamentals of Queueing Theory , 1999, Technometrics.

[13] William Aiello,et al. An atomic model for message-passing , 1993, SPAA '93.

[14] Nian-Feng Tzeng,et al. Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[15] Edsger W. Dijkstra,et al. Solution of a problem in concurrent programming control , 1965, CACM.

[16] Gregory F. Pfister,et al. “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[17] A. Gottleib,et al. The nyu ultracomputer- designing a mimd shared memory parallel computer , 1983 .

[18] Eli Upfal,et al. A Steady State Analysis of Diffracting Trees , 1998, Theory of Computing Systems.

[19] Marcin Paprzycki,et al. High Performance Computing: Challenges for Future Systems , 1998, IEEE Concurrency.

[20] Michael K. Reiter,et al. Byzantine quorum systems , 1997, STOC '97.

[21] Peter March,et al. Stability of binary exponential backoff , 1988, JACM.

[22] Mamoru Maekawa,et al. A N algorithm for mutual exclusion in decentralized systems , 1985, TOCS.

[23] A. Fleischmann. Distributed Systems , 1994, Springer Berlin Heidelberg.

[24] Peter Widmayer,et al. Balanced Distributed Search Trees Do Not Exist , 1995, WADS.

[25] Randal E. Bryant,et al. Concurrent programming , 1980, Operating Systems Engineering.

[26] DeGroot. Proceedings of the 1985 international conference on parallel processing , 1985 .

[27] Roger Wattenhofer,et al. Distributed counting at maximum speed , 1997 .

[28] E T. Leighton,et al. Introduction to parallel algorithms and architectures , 1991 .

[29] Mary K. Vernon,et al. Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS 1989.

[30] Nancy A. Lynch,et al. Are wait-free algorithms fast? , 1994, JACM.

[31] Maurice Herlihy,et al. Scalable concurrent counting , 1995, TOCS.

[32] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[33] P. Erdos-L Lovász. Problems and Results on 3-chromatic Hypergraphs and Some Related Questions , 2022 .

[34] J. S. Dagpunar,et al. Principles of Discrete Event Simulation , 1980 .

[35] G. S. Graham. A New Solution of Dijkstra ' s Concurrent Programming Problem , 2022 .

[36] C. Greg Plaxton,et al. Small-depth counting networks , 1992, STOC '92.

[37] Christos H. Papadimitriou,et al. The serializability of concurrent database updates , 1979, JACM.

[38] Roger Wattenhofer,et al. The counting pyramid , 1998 .

[39] E. Upfal. A Steady State Analysis of Diiracting Trees , 1997 .

[40] Marios Mavronicolas,et al. A combinatorial treatment of balancing networks , 1994, PODC '94.

[41] Amotz Bar-Noy,et al. Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.

[42] Roger Wattenhofer,et al. An inherent bottleneck in distributed counting , 1997, PODC '97.

[43] Nancy A. Lynch,et al. Counting networks are practically linearizable , 1996, PODC '96.

[44] Gyungho Lee,et al. The Effectiveness of Combining in Shared Memory Parallel Computer in the Presence of "Hot Spots" , 1986, ICPP.

[45] Moni Naor,et al. The load, capacity and availability of quorum systems , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[46] Nancy A. Lynch,et al. Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[47] Hagit Attiya,et al. Sequential consistency versus linearizability , 1994, TOCS.

[48] Hagit Attiya,et al. Counting networks with arbitrary fan-out , 1992, SODA '92.

[49] Roger Wattenhofer,et al. Fast counting with the optimum combining tree , 1998 .

[50] Maurice Herlihy,et al. Wait-free synchronization , 1991, TOPL.

[51] Udi Manber,et al. Introduction to algorithms - a creative approach , 1989 .

[52] David Peleg,et al. Distributed Data Structures: A Complexity-Oriented View , 1991, WDAG.

[53] J. R. Jackson. Networks of Waiting Lines , 1957 .

[54] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[55] David Peleg,et al. Crumbling walls: a class of practical and efficient quorum systems , 1995, PODC '95.

[56] Maurice Herlihy,et al. Counting networks , 1994, JACM.

[57] Leonard Allen Cohn. A conceptual approach to general purpose parallel computer architecture , 1983 .

[58] Samuel Karlin,et al. A First Course on Stochastic Processes , 1968 .

[59] Yossi Matias,et al. The QRQW PRAM: accounting for contention in parallel algorithms , 1994, SODA '94.

[60] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[61] Maurice Herlihy,et al. Low contention linearizable counting , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[62] Yehuda Afek,et al. Wait-free made fast , 1995, STOC '95.

[63] Witold Litwin,et al. Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[64] Szu-Tsung Cheng,et al. Parallelism and locality in priority queues , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[65] Hagit Attiya,et al. Sharing memory robustly in message-passing systems , 1990, PODC '90.

[66] Maurice Herlihy,et al. Counting networks and multi-processor coordination , 1991, STOC '91.

[67] Leslie Lamport,et al. Specifying Concurrent Program Modules , 1983, TOPL.

[68] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[69] Michael Klugerman,et al. Small-depth counting networks and related topics , 1994 .

[70] Nir Shavityz,et al. Diiracting Trees , 1994 .

[71] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[72] Mitchell L. Neilsen,et al. Quorum structures in distributed systems , 1992 .

[73] Hector Garcia-Molina,et al. How to assign votes in a distributed system , 1985, JACM.

[74] Allan Gottlieb,et al. Coordinating parallel processors: a partial unification , 1981, CARN.

[75] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[76] Marina Papatriantafilou,et al. The impact of timing on linearizability in counting networks , 1997, Proceedings 11th International Parallel Processing Symposium.

[77] Eli Upfal,et al. A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[78] E. Szemerédi,et al. O(n LOG n) SORTING NETWORK. , 1983 .

[79] Ben Atkinson,et al. Queueing theory in manufacturing systems analysis and design , 1993 .