Evaluating the Scalability of Distributed Systems

Many distributed systems must be scalable, meaning that they must be economically deployable in a wide range of sizes and configurations. This paper presents a scalability metric based on cost-effectiveness, where the effectiveness is a function of the system's throughput and its quality of service. It is part of a framework which also includes a sealing strategy for introducing changes as a function of a scale factor, and an automated virtual design optimization at each scale factor. This is an adaptation of concepts for scalability measures in parallel computing. Scalability is measured by the range of scale factors that give a satisfactory value of the metric, and good scalability is a joint property of the initial design and the scaling strategy. The results give insight into the scaling capacity of the designs, and into how to improve the design. A rapid simple bound on the metric is also described. The metric is demonstrated in this work by applying it to some well-known idealized systems, and to real prototypes of communications software.

[1]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[2]  Wesley E. Snyder,et al.  Optimization of functions with many minima , 1991, IEEE Trans. Syst. Man Cybern..

[3]  Colin Allison,et al.  Scalable services for resource management in distributed and networked environments , 1996, Proceedings of Third International Workshop on Services in Distributed and Networked Environments.

[4]  C. M. Woodside,et al.  A scalability metric for distributed computing applications in telecommunications , 1997 .

[5]  Alfred Giessler,et al.  Free Buffer Allocation - An Investigation by Simulation , 1978, Comput. Networks.

[6]  Vipin Kumar,et al.  Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[7]  Shikharesh Majumdar,et al.  Performance Bounds for Concurrent Software with Rendezvous , 1991, Perform. Evaluation.

[8]  Peter Triantafillou,et al.  The Location Based Paradigm for Replication: Achieving Efficiency and Availability in Distributed Systems , 1995, IEEE Trans. Software Eng..

[9]  Leonard Kleinrock,et al.  Power and deterministic rules of thumb for probabilistic problems in computer communications , 1979 .

[10]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[11]  Jerome A. Rolia,et al.  The Method of Layers , 1995, IEEE Trans. Software Eng..

[12]  Shikharesh Majumdar,et al.  The Stochastic Rendezvous Network Model for Performance of Synchronous Client-Server-like Distributed Software , 1995, IEEE Trans. Computers.

[13]  David E. Culler,et al.  Using smart clients to build scalable services , 1997 .

[14]  Xian-He Sun,et al.  Performance Considerations of Shared Virtual Memory Machines , 1995, IEEE Trans. Parallel Distributed Syst..

[15]  C. Murray Woodside,et al.  Evaluating the scalability of distributed systems , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[16]  Pankaj Mehra,et al.  Automated scalability analysis of message-passing parallel programs , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[17]  Lionel M. Ni,et al.  Scalable Problems and Memory-Bounded Speedup , 1993, J. Parallel Distributed Comput..

[18]  Lester Ingber,et al.  Simulated annealing: Practice versus theory , 1993 .

[19]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[20]  Jerome A. Rolia,et al.  A Toolset for Performance Engineering and Software Design of Client-Server Systems , 1995, Perform. Evaluation.