Efficient Program Partitioning Based on Compiler Controlled Communication

In this paper, we present an efficient framework for intraprocedural performance based program partitioning for sequential loop nests. Due to the limitations of static dependence analysis especially in the inter-procedural sense, many loop nests are identified as sequential but available task parallelism amongst them could be potentially exploited. Since this available parallelism is quite limited, performance based program analysis and partitioning which carefully analyzes the interaction between the loop nests and the underlying architectural characteristics must be undertaken to effectively use this parallelism.

[1]  Dharma P. Agrawal,et al.  A Scalable Scheduling Scheme for Functional Parallelism on Distributed Memory Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[2]  Thomas Fahringer Estimating and Optimizing Performance for Parallel Programs , 1995, Computer.

[3]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[4]  Rudolf Eigenmann,et al.  Symbolic range propagation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[5]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[6]  Santosh Pande,et al.  Program Repartitioning on Varying Communication Cost Parallel Architectures , 1996, J. Parallel Distributed Comput..

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[9]  Utpal Banerjee Loop Parallelization , 1994, Springer US.

[10]  Jaspal Subhlok,et al.  Optimal mapping of sequences of data parallel tasks , 1995, PPOPP '95.

[11]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[12]  Philippe Chrétienne Tree Scheduling with Communication Delays , 1994, Discret. Appl. Math..

[13]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[14]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[15]  Andrew A. Chien,et al.  Software overhead in messaging layers: where does the time go? , 1994, ASPLOS VI.

[16]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[17]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[18]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[19]  Dharma P. Agrawal,et al.  Optimal Scheduling Algorithm for Distributed-Memory Machines , 1998, IEEE Trans. Parallel Distributed Syst..

[20]  Keshav Pingali,et al.  Solving Alignment Using Elementary Linear Algebra , 2001, Compiler Optimizations for Scalable Parallel Systems Languages.