Scale in Distributed Systems

In recent y ears, scale has become a factor of increasing importance in the design of distributed systems. The scale of a system has three dimensions: numerical, geographical, and administrative. The numerical dimension consists of the number of users of the system, and the number of objects and services encompassed. The geographical dimension consists of the distance over which the system is scattered. The administrative dimension consists of the number of organizations that exert control over pieces of the system. The three dimensions of scale aaect distributed systems in many w ays. Among the aaected components are naming, authentication, authorization, accounting, communication, the use of remote resources, and the mechanisms by which users view the system. Scale aaects reliability: as a system scales numerically, the likelihood that some host will be down increases; as it scales geographically, the likelihood that all hosts can communicate will decrease. Scale also aaects performance: its numerical component aaects the load on the servers and the amount of communication; its geographic component aaects communication latency. Administrative complexity is also aaected by scale: administration becomes more diicult as changes become more frequent and as they require the interaction of diierent administrative e n tities, possibly with connicting policies. Finally, scale aaects heterogeneity: as the size of a system grows it becomes less likely that all pieces will be identical. This paper looks at scale and how it aaects distributed systems. Approaches taken by existing systems are examined and their common aspects highlighted. The limits of scalability in these systems are discussed. A set of principles for scalable systems is presented along with a list of questions to be asked when considering how far a system scales.

[1]  Roger M. Needham,et al.  Using encryption for authentication in large networks of computers , 1978, CACM.

[2]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[3]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[4]  Paul V. Mockapetris,et al.  Domain names: Concepts and facilities , 1983, RFC.

[5]  Roger M. Needham,et al.  Experience with Grapevine: the growth of a distributed system , 1984, TOCS.

[6]  Bruce Walker,et al.  The LOCUS distributed operating system , 1983, SOSP '83.

[7]  Songnian Zhou,et al.  The Berkeley Internet Name Domain Server , 1984 .

[8]  Butler W. Lampson,et al.  Hints for Computer System Design , 1983, IEEE Software.

[9]  Robert Joseph Fowler,et al.  Decentralized object finding using forwarding address , 1985 .

[10]  Robert Joseph Fowler,et al.  Decentralized object finding using forwarding addresses (object, network, distribution) , 1985 .

[11]  David A. Goldberg,et al.  Design and Implementation of the Sun Network Filesystem , 1985, USENIX Conference Proceedings.

[12]  Butler W. Lampson,et al.  Designing a global name service , 1986, PODC '86.

[13]  Andrew S. Tanenbaum,et al.  The Design of a Capability-Based Distributed Operating System , 1986, Comput. J..

[14]  Butler W. Lampson,et al.  A Global Authentication Service without Global Trust , 1986, 1986 IEEE Symposium on Security and Privacy.

[15]  Paul V. Mockapetris,et al.  Domain names - concepts and facilities , 1987, RFC.

[16]  L. F. Cabrera,et al.  QuickSilver distributed file services: an architecture for horizontal growth , 1988, [1988] Proceedings. 2nd IEEE Conference on Computer Workstations.

[17]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[18]  B. Clifford Neuman,et al.  Kerberos: An Authentication Service for Open Network Systems , 1988, USENIX Winter.

[19]  Andrew P. Black,et al.  Interconnecting heterogeneous computer systems , 1988, CACM.

[20]  David P. Anderson,et al.  The DASH Project: An Overview , 1988 .

[21]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[22]  Michael F. Schwartz The Networked Resource Discovery Project , 1989, IFIP Congress.

[23]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[24]  Garret Swart,et al.  Availability and consistency tradeoffs in the Echo distributed file system , 1989, Proceedings of the Second Workshop on Workstation Operating Systems.

[25]  Ralph E. Droms,et al.  An Experimental Implementation of the Tilde Naming System , 1990, Comput. Syst..

[26]  Mahadev Satyanarayanan,et al.  Scalable, secure, and highly available distributed file access , 1990, Computer.

[27]  Daniel E. Geer,et al.  Project Athena as a distributed computer system , 1990, Computer.

[28]  Guido van Rossum,et al.  Experience with the Amoeba distributed operating system , 1991 .

[29]  B. Clifford Neuman,et al.  The Prospero File System: A Global File System Based on the Virtual System Model , 1992, Comput. Syst..

[30]  B. Clifford Neuman,et al.  Proxy-based authorization and accounting for distributed systems , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.