Implicit Finite Element Applications: A Case for Matching the Number of Processors to the Dynamics of the Program Execution

Generally, parallel scienti c applications are executed on a xed number of processors determined to be optimal by an e ciency analysis of the application's computational kernel. It is well-known, however, that the degree of parallelism found in di erent parts of an application varies. In this paper, we present the results of an in-depth study quantifying the advantages of matching the number of processors to the parallelism pro le for a widely used application, nite element analysis. The study entails using e ectiveness as the performance metric. The results indicate that the varying processor allocation is signi cantly more e ective than xed processor allocation by one to nine orders of magnitude for problems with 10 to 10 nodes. Further, the results indicate that it is very e ective to match the number of processors to the parallelism pro le even when only a small percentage of the application has a di erent degree of parallelism.