Performance Properties of Large Scale Parallel Systems

Abstract There are several metrics that characterize the performance of a parallel system, such as parallel execution time, speedup, and efficiency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a fixed size, the speedup of a parallel algorithm does not continue to increase with increasing number of processors. It usually tends to saturate or peak at a certain limit. Thus, it may not be useful to employ more than an optimal number of processors for solving a problem on a parallel computer. This optimal number of processors depends on the problem size, the parallel algorithm, and the parallel architecture. In this paper we study the impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimization of the parallel execution time. We then study a more general criterion of optimality and show how operating at the optimal point is equivalent to operating at a unique value of efficiency that is characteristic of the criterion of optimality and the properties of the parallel system under study. We put the technical results derived in this paper in perspective with similar results that have appeared in the literature before and show how this paper generalizes and/or extends these earlier results.

[1]  David M. Nicol,et al.  Problem Size, Parallel Architecture, and Optimal Speedup , 1987, J. Parallel Distributed Comput..

[2]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[3]  Vipin Kumar,et al.  Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[4]  Ken Kennedy,et al.  Performance of parallel processors , 1989, Parallel Comput..

[5]  Robert E. Benner,et al.  Development of Parallel Methods for a $1024$-Processor Hypercube , 1988 .

[6]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[7]  S. Lennart Johnsson,et al.  A radix-2 FFT on connection machine , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[8]  Vipin Kumar,et al.  Performance and Scalability of Preconditioned Conjugate Gradient Methods on Parallel Computers , 1995, IEEE Trans. Parallel Distributed Syst..

[9]  Anant Agarwal,et al.  Scalability of parallel machines , 1991, CACM.

[10]  Michael D. Rice,et al.  Modeling the Serial and Parallel Fractions of a Parallel Algorithm , 1991, J. Parallel Distributed Comput..

[11]  Michael D. Rice,et al.  A MODEL OF PARALLEL PERFORMANCE , 1989 .

[12]  Jau-Hsiung Huang,et al.  On Parallel Processing Systems: Amdahl's Law Generalized and Some Results on Optimal Design , 1992, IEEE Trans. Software Eng..

[13]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[14]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[15]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[16]  Selim G. Akl,et al.  Design and analysis of parallel algorithms , 1985 .

[17]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[18]  Vipin Kumar,et al.  Scalability of parallel sorting on mesh multicomputers , 1991, [1991] Proceedings. The Fifth International Parallel Processing Symposium.

[19]  Dipak Ghosal,et al.  An empirical study of the effect of granularity on parallel algorithms on the connection machine , 1989 .

[20]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[21]  Frederic A. Van-Catledge Toward a General Model for Evaluating the Relative Performance of Computer Systems , 1989, Int. J. High Perform. Comput. Appl..

[22]  Patrick H. Worley,et al.  The Effect of Time Constraints on Scaled Speedup , 1990, SIAM J. Sci. Comput..

[23]  Vipin Kumar,et al.  Scalability of Parallel Algorithms for the All-Pairs Shortest-Path Problem , 1991, J. Parallel Distributed Comput..

[24]  Dan C. Marinescu,et al.  On High Level Characterization of Parallelism , 1994, J. Parallel Distributed Comput..

[25]  Xian-He Sun,et al.  Toward a better parallel performance metric , 1991, Parallel Comput..

[26]  Vipin Kumar,et al.  Analyzing performance of large scale parallel systems , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[27]  Guo-Jie Li,et al.  Optimal Granularity of Grid Iteration Problems , 1990, International Conference on Parallel Processing.

[28]  John L. Gustafson,et al.  The consequences of fixed time performance measurement , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[29]  Vijay P. Kumar,et al.  Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[30]  Vipin Kumar,et al.  The Scalability of FFT on Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..