Scheduling, partitioning and mapping of uniform dependence algorithms on processor arrays

Three related problems, among others, are faced when trying to execute an algorithm on a parallel machine. The scheduling problem deals with minimizing the time taken to execute all computations of the algorithm without violating data dependencies. The independent partitioning problem deals with partitioning the algorithm into blocks so that no data communications take place between computations in different blocks, and the conflict-free mapping problem involves mapping n-dimensional algorithms (algorithms with n nested loops) into lower dimensional processor arrays without computation conflicts. In this thesis, techniques to find optimal solutions to these problems are presented. Optimality is guaranteed when the techniques are applied to algorithms with uniform dependencies and linear schedules are used. These algorithms occur frequently in scientific computing and digital signal processing applications and can often be coded as nested-loop programs. The proposed solutions can be used in optimizing compilers and to map algorithms into processor arrays, especially to program bit-level processor arrays. A uniform dependence algorithm consists of a set of indexed computations and a set of uniform dependence vectors. If one computation uses data generated by another computation, then this data dependence is represented by the difference of their indices (called dependence vector). A dependence vector is uniform if its value is independent of the indices of computations. Linear schedules are a special class of schedules described by a linear mapping of computation indices into time. The complexity of the proposed method to identify optimal linear schedules is independent of the algorithm size. Also, linear schedules are compared with free schedules, the best schedules possible. The comparison indicates that optimal linear schedules can be as efficient as free schedules and identifies a class of algorithms for which this is always true. Two methods are presented which find independent algorithm partitions, and both outperform previously proposed approaches in terms of computational complexity and/or optimality. To map n-dimensional algorithms into lower dimensional processor arrays, necessary and sufficient conditions are derived for a mapping to be conflict-free, that is no two or more computations are mapped into the same processor and the same execution time. By these conditions and other optimization techniques, procedures are proposed to find time optimal and conflict-free mappings.