Mapping Unstructured Applications into Nested Parallelism

Nested parallel programming models, where the task graph associated to a computation is series-parallel are easy to program and show good analysis properties. These can be exploited for efficient scheduling, accurate cost estimation or automatic mapping to different architectures. Restricting synchronization structures to nested series-parallelism may bring performance losses due to a less parallel solution, as compared to more generic ones based in unstructured models (e.g. message passing). A new algorithmic technique is presented which allows automatic transformation of the task graph of any unstructured application to a series-parallel form (nested-parallelism). The tool is applied to random and irregular application task graphs to investigate the potential performance degradation when conveying them into series-parallel form. Results show that a wide range of irregular applications can be expressed using a structured coordination model with a small loss of parallelism.

[1]  Evripidis Bampis,et al.  Scheduling UET-UCT Series-Parallel Graphs on Two Processors , 1996, Theor. Comput. Sci..

[2]  Valentín Cardeñoso-Payo,et al.  Measuring the Performance Impact of SP-Restricted Programming in Shared-Memory Machines , 2000, VECPAR.

[3]  David B. Skillicorn,et al.  Models and languages for parallel computation , 1998, CSUR.

[4]  Hans L. Bodlaender,et al.  Dynamic Algorithms for Graphs with Treewidth 2 , 1993, WG.

[5]  Claire Hanen,et al.  Using Duplication for Scheduling Unitary Tasks on m Processors with Unit Communication Delays , 1997, Theor. Comput. Sci..

[6]  Arjan J. C. van Gemund,et al.  The importance of synchronization structure in parallel program optimization , 1997, ICS '97.

[7]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[8]  Welf Löwe,et al.  On the optimization by redundancy using an extended LogP model , 1997, Proceedings. Advances in Parallel and Distributed Computing.

[9]  Hironori Kasahara,et al.  A standard task graph set for fair evaluation of multiprocessor scheduling algorithms , 2002 .

[10]  Eugene L. Lawler,et al.  The Recognition of Series Parallel Digraphs , 1982, SIAM J. Comput..

[11]  Arjan J. C. van Gemund,et al.  TGEX: a Tool for Portable Parallel and Distributed Execution of Unstructured Problems , 1996, HPCN Europe.

[12]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[13]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[14]  Yike Guo,et al.  Functional Skeletons for Parallel Coordination , 1995, Euro-Par.

[15]  Valentín Cardeñoso-Payo,et al.  On the Loss of Parallelism by imposing Synchronization Structure , 1997, Euro-PDS.

[16]  Olaf Bonorden,et al.  The Paderborn university BSP (PUB) library-design, implementation and performance , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[17]  Allen D. Malony,et al.  Automatic Scalability Analysis of Parallel Programs Based on Modeling Techniques , 1994, Computer Performance Evaluation.

[18]  Michael Dahlin,et al.  Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design , 1999, SODA '99.

[19]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[20]  Murray Cole Frame: An Imperative Coordination Language for Parallel Programming , 2000 .

[21]  Christoph W. Kessler,et al.  NestStep: Nested Parallelism and Virtual Shared Memory for the BSP Model , 2000, The Journal of Supercomputing.

[22]  Pascal Weil,et al.  Series-Parallel Posets: Algebra, Automata and Languages , 1998, STACS.

[23]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis Using Directed Acyclic Graphs , 1987, IEEE Transactions on Software Engineering.

[24]  David B. Skillicorn,et al.  Static Scheduling Using Task Replication for LogP and BSP Models , 1998, Euro-Par.

[25]  Paul G. Spirakis,et al.  BSP vs LogP , 1996, SPAA '96.

[26]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1992, J. Parallel Distributed Comput..

[27]  Jerzy Kamburowski,et al.  Optimal Reductions of Two-Terminal Directed Acyclic Graphs , 1992, SIAM J. Comput..

[28]  Wentong Cai,et al.  A Cost Calculus for Parallel Functional Programming , 1995, J. Parallel Distributed Comput..