Cooperative Task-Oriented Computing: Algorithms and Complexity

Abstract Cooperative network supercomputing is becoming increasingly popular for harnessing the power of the global Internet computing platform. A typical Internet supercomputer consists of a master computer or server and a large number of computers called workers, performing computation on behalf of the master. Despite the simplicity and benefits of a single master approach, as the scale of such computing environments grows, it becomes unrealistic to assume the existence of the infallible master that is able to coordinate the activities of multitudes of workers. Large-scale distributed systems are inherently dynamic and are subject to perturbations, such as failures of computers and network links, thus it is also necessary to consider fully distributed peer-to-peer solutions. We present a study of cooperative computing with the focus on modeling distributed computing settings, algorithmic techniques enabling one to combine efficiency and fault-tolerance in distributed systems, and the exposition of trade...

[1]  Charles U. Martel,et al.  Work-Optimal Asynchronous Algorithms for Shared Memory Parallel Computers , 1992, SIAM J. Comput..

[2]  Milind Tambe,et al.  Building Agent Teams Using an Explicit Teamwork Model and Learning , 1999, Artif. Intell..

[3]  Chryssis Georgiou,et al.  Algorithmic mechanisms for internet-based master-worker computing with untrusted and selfish workers , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[4]  Moti Yung,et al.  Resolving message complexity of Byzantine Agreement and beyond , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[5]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[6]  Z. M. Kedem,et al.  Combining tentative and definite executions for dependable parallel computing , 1990 .

[7]  Allan Borodin,et al.  On the power of randomization in on-line algorithms , 2005, Algorithmica.

[8]  Dariusz R. Kowalski,et al.  Bounding Work and Communication in Robust Cooperative Computation , 2002, DISC.

[9]  Partha Dasgupta,et al.  Parallel processing on networks of workstations: a fault-tolerant, high performance approach , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[10]  Dariusz R. Kowalski,et al.  Explicit Combinatorial Structures for Cooperative Distributed Algorithms , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[11]  Alexander A. Shvartsman,et al.  Efficient Parallel Algorithms Can Be Made Robust , 1989, PODC.

[12]  Noam Nisan,et al.  Algorithmic Mechanism Design , 2001, Games Econ. Behav..

[13]  D. Peleg,et al.  Crumbling Walls: A Class of High Availability Quorum Systems , 1994, PODC 1994.

[14]  Dariusz R. Kowalski,et al.  Writing-all deterministically and optimally using a non-trivial number of asynchronous processors , 2004, SPAA '04.

[15]  Mahdi Abdelguerfi,et al.  Emerging Trends in Database and Knowledge-Base Machines: The Application of Parallel Architectures to Smart Information Systems , 1995 .

[16]  Shlomi Dolev,et al.  Dynamic load balancing with group communication , 2006, Theor. Comput. Sci..

[17]  Alexander A. Shvartsman,et al.  Fault-Tolerant Parallel Computation , 1997 .

[18]  Stuart A. Green,et al.  Parallel processing for computer graphics , 1991, Research monographs in parallel and distributed computing.

[19]  Noga Alon,et al.  Explicit construction of linear sized tolerant networks , 1988, Discret. Math..

[20]  C. B. Jenssen Parallel computational fluid dynamics : trends and applications : proceedings of the Parallel CFD 2000 Conference , Trondheim, Norway (May 22-25, 2000) , 2001 .

[21]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[22]  Charles U. Martel,et al.  On the Complexity of Certified Write-All Algorithms , 1994, J. Algorithms.

[23]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[24]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[25]  D. R. Hughes Design Theory , 1985 .

[26]  Sanguthevar Rajasekaran,et al.  Robust Network Supercomputing with Malicious Processes , 2006, DISC.

[27]  Prabhakar Ragde,et al.  Parallel Algorithms with Processor Failures and Delays , 1996, J. Algorithms.

[28]  Chryssis Georgiou,et al.  Algorithmic Mechanisms for Internet Supercomputing under Unreliable Communication , 2011, 2011 IEEE 10th International Symposium on Network Computing and Applications.

[29]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[30]  Noga Alon,et al.  Scalable Secure Storage When Half the System Is Faulty , 2000, Inf. Comput..

[31]  Mihalis Yannakakis,et al.  On the value of information in distributed decision-making (extended abstract) , 1991, PODC '91.

[32]  Grzegorz Malewicz,et al.  A Work-Optimal Deterministic Algorithm for the Certified Write-All Problem with a Nontrivial Number of Asynchronous Processors , 2005, SIAM J. Comput..

[33]  Chryssis Georgiou,et al.  Efficient gossip and robust distributed computation , 2005, Theor. Comput. Sci..

[34]  Dariusz R. Kowalski,et al.  Emulating shared-memory Do-All algorithms in asynchronous message-passing systems , 2010, J. Parallel Distributed Comput..

[35]  Chryssis Georgiou,et al.  Cooperative computing with fragmentable and mergeable groups , 2003, J. Discrete Algorithms.

[36]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[37]  Alexander Russell,et al.  Failure-Sensitive Analysis of Parallel Algorithms with Controlled Memory Access Concurrency , 2007, Parallel Process. Lett..

[38]  Robert G. Gallager,et al.  A perspective on multiaccess channels , 1984, IEEE Trans. Inf. Theory.

[39]  Andrzej Lingas,et al.  Performing work in broadcast networks , 2005, Distributed Computing.

[40]  David P. Anderson,et al.  SETI@home-massively distributed computing for SETI , 2001, Comput. Sci. Eng..

[41]  Alexander Russell,et al.  Distributed scheduling for disconnected cooperation , 2005, Distributed Computing.

[42]  Danny Dolev,et al.  Distributed computing meets game theory: robust mechanisms for rational secret sharing and multiparty computation , 2006, PODC '06.

[43]  Alexander A. Shvartsman,et al.  Models for Robust Computation , 1997 .

[44]  Alexander Russell,et al.  Distributed Cooperation During the Absence of Communication , 2000, DISC.

[45]  Chryssis Georgiou,et al.  Reliably Executing Tasks in the Presence of Untrusted Entities , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[46]  Ahmed F. Ghoniem,et al.  Massively parallel implementation of a 3D vortex-boundary element method , 1996 .

[47]  Nancy A. Lynch,et al.  Rambo: a robust, reconfigurable atomic memory service for dynamic networks , 2010, Distributed Computing.

[48]  Alexander Russell,et al.  The complexity of synchronous iterative Do-All with crashes , 2003, Distributed Computing.

[49]  Alexander Russell,et al.  Work-Competitive Scheduling for Cooperative Computing with Dynamic Groups , 2005, SIAM J. Comput..

[50]  Dariusz R. Kowalski,et al.  Performing work with asynchronous processors: Message-delay-sensitive bounds , 2005, Inf. Comput..

[51]  Joseph Y. Halpern,et al.  Performing Work Efficiently in the Presence of Faults , 1998, SIAM J. Comput..

[52]  Bogdan S. Chlebus,et al.  Performing tasks on synchronous restartable message-passing processors , 2001, Distributed Computing.

[53]  Alexander Russell,et al.  Randomized Work-Competitive Scheduling for Cooperative Computing on k-partite Task Graphs , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[54]  Richard J. Anderson,et al.  Algorithms for the Certified Write-All Problem , 1997, SIAM J. Comput..

[55]  David Peleg,et al.  Crumbling Walls: A Class of Practical and Efficient Quorum Systems (Extended Abstract) , 1995, PODC.

[56]  Andrea E. F. Clementi,et al.  Optimal F-Reliable Protocols for the Do-All Problem on Single-Hop Wireless Networks , 2002, ISAAC.

[57]  Douglas R. Stinson,et al.  Cryptography: Theory and Practice , 1995 .

[58]  Paul G. Spirakis,et al.  Efficient robust parallel computations , 2018, STOC '90.

[59]  Dariusz R. Kowalski,et al.  Robust gossiping with an application to consensus , 2006, J. Comput. Syst. Sci..

[60]  Joseph Naor,et al.  Constructions of Permutation Arrays for Certain Scheduling Cost Measures , 1995, Random Struct. Algorithms.

[61]  Hector Garcia-Molina,et al.  How to assign votes in a distributed system , 1985, JACM.

[62]  Alexander Russell,et al.  The Do-All problem with Byzantine processor failures , 2003, Theor. Comput. Sci..

[63]  F. Reynolds Reliable distributed computing with the Isis toolkit [Book Reviews] , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[64]  Jan Friso Groote,et al.  An algorithm for the asynchronous Write-All problem based on process collision , 2001, Distributed Computing.

[65]  Dariusz R. Kowalski,et al.  Time and Communication Efficient Consensus for Crash Failures , 2006, DISC.

[66]  Dariusz R. Kowalski,et al.  Writing-all deterministically and optimally using a nontrivial number of asynchronous processors , 2008, ACM Trans. Algorithms.

[67]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[68]  Fernando J. Corbacho,et al.  Small-world topology for multi-agent collaboration , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[69]  Peter Meer,et al.  Adaptive Multiresolution Structures for Image Processing on Parallel Computers , 1994, J. Parallel Distributed Comput..

[70]  Alexander A. Shvartsman,et al.  Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms , 1995, Nord. J. Comput..

[71]  Eli Upfal Tolerating a Linear Number of Faults in Networks of Bounded Degree , 1994, Inf. Comput..

[72]  Z. M. Kedem,et al.  Combining tentative and definite executions for very fast dependable parallel computing , 1991, STOC '91.

[73]  Thomas A. Funkhouser,et al.  Load balancing for multi-projector rendering systems , 1999, Workshop on Graphics Hardware.

[74]  Alexander A. Shvartsman Achieving Optimal CRCW PRAM Fault-Tolerance , 1991, Inf. Process. Lett..

[75]  Paul G. Spirakis,et al.  Optimal, Distributed Decision-Making: The Case of No Communication , 1999, FCT.

[76]  Moti Yung,et al.  Time-optimal message-efficient work performance in the presence of faults , 1994, PODC '94.