Computing performance as a function of the speed, quantity, and cost of the processors

Everyone wants more computing power for their applications, and the industry has responded in two ways: first by increasing the speed of single CPU's, and second by deploying multiple processors in parallel. Much controversy exists over how best to balance processor speed against the number of processors employed. Is it better to have a single, very fast and very expensive CPU, thousands of very slow but very cheap CPU's, or is there some optimal mix in between? The value of single processors, measured in floating-point performance per dollar, is relatively easy to assess, but the corresponding value of parallel systems is obscured by the fact that applications are not generally perfectly parallel, with some loss in efficiency occurring due to sequential bottlenecks and communication overhead. Parallel speedup, the ratio of execution time on a single processor to that on p processors, is often used to capture the effect and measure the efficiency of parallel utilization. We argue that this measure of efficiency is not a good measure of parallel performance because it rewards slow processors. Instead we evaluate delivered floating-point performance as a function of the number of processors for either constant aggregate performance of the processors, or constant total cost. From these measures we offer two conclusions: 1) For a given aggregate floating-point performance, actual delivered performance never increases with the number of processors. and 2) For a given cost, delivered performance is maximized by selecting the fastest processor available at a given technology level, and employing as many as the budget allows. These results, which are generally known to parallel researchers, are often overlooked in the marketing announcements promoting “massively parallel” systems. We motivate this discussion by giving measured performance results from an actual application, and then show the theoretical basis.