Managing parallelism and resources in scientific dataflow programs

Abstract : Exploiting parallelism to achieve high performance invariably increases the resource requirements of a program. This is particularly serious under dynamic dataflow execution, because all the potential parallelism in a program is exposed. The resource requirements can be excessive, often leading to deadlock. This phenomenon is documented using parallelism and resource profiles derived under an ideal dataflow execution model. This report examines how resource requirements can be managed effectively by controlling the ways in which parallelism is exposed. A mechanism for controlling parallelism in scientific programs, called k-bounded loops, is presented. This involves compiling loops into dataflow graphs in a manner that allows the maximum number of concurrent iterations to be set dynamically, when the loop is invoked. A policy for employing this mechanism is developed and tested on a variety of programs. Through static analysis of the program, parametric resource expressions are formulated and the potential parallelism is characterized. Based on this analysis, the program is augmented with resource management code that computes the k-bounds by simple formulae, involving program variables and an overall resource parameter that reflects the capacity of the machine. This approach is shown to be effective for containing the resource requirements of scientific dataflow programs, while exposing adequate parallelism.