Iteration Partitioning for Resolving Stride Conflicts on Cache-Coherent Multiprocessors

We develop compile-time iteration partitioning techniques for private-cache shared-memory mulriprocessors. Our techniques assign loop iterations to a set of processors so that cache coherency traffic due io interprocessor communication is minimized and load balance is maintained. In contrast to most previous research that has examined uniformly-generated dependences, we develop methods for non-uniform dependences that are generated by stride conflicts. Furthermore, we consider the effects of a long cache line size and minimize false coherency traffic. Our methods can handle conflicts between any two integer strides. We have conducted experiments on a 32-processor KSR-1 from Kendall Square Research which show 2x performance improvement using our partitioning algorithm over standard contiguous partitioning techniques.

[1]  R. Sarnath,et al.  Proceedings of the International Conference on Parallel Processing , 1992 .

[2]  Weijia Shang,et al.  Independent Partitioning of Algorithms with Uniform Dependencies , 1992, IEEE Trans. Computers.

[3]  Mi Lu,et al.  An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing , 1991, LCPC.

[4]  Santosh G. Abraham,et al.  Data and program restructuring of irregular applications for cache-coherent multiprocessor , 1994, ICS '94.

[5]  Guy L. Steele,et al.  Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..

[6]  Erik H. D'Hollander,et al.  Partitioning and Labeling of Index Sets in DO Loops with Constant Dependence Vectors , 1989, ICPP.

[7]  Santosh G. Abraham,et al.  Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[9]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992 .

[10]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992, ICS '92.

[11]  E. Davidson,et al.  Time Domain Modeling for Large Scale Cosite Interference Problems Utilizing Parallel Computing and Wavelets , 2022 .

[12]  Jih-Kwon Peir,et al.  Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[13]  Zhiyu Shen,et al.  An Empirical Study of Fortran Programs for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..

[14]  Weijia Shang,et al.  Independent Partitioning of Algorithms With Uniform Data Dependencies , 1988, International Conference on Parallel Processing.

[15]  K. Knobe,et al.  Data optimization: minimizing residual interprocessor data motion on SIMD machines , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[16]  Manish Gupta,et al.  Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..