Work in Progress: Malleable Software Pipelines for Efficient Many-core System Utilization

This paper details our current research project on the efficient utilization of many-core systems by utilizing applications based on a novel kind of software pipelines. These pipelines form malleable applications that can change their degree of parallelism at runtime. This allows not only for a well-balanced load, but also for an efficient distribution of the cores to the individual, competing applications to maximize the overall system performance. We are convinced that malleable software pipelines will significantly outperform existing mapping and scheduling solutions.

[1]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[2]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[3]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[4]  Achim Streit,et al.  Efficient Resource Management for Malleable Applications , 2001 .

[5]  Laxmikant V. Kalé,et al.  Adaptive MPI , 2003, LCPC.

[6]  Chuck Pheatt,et al.  Intel® threading building blocks , 2008 .

[7]  Coniferous softwood GENERAL TERMS , 2003 .

[8]  Laxmikant V. Kalé,et al.  A Malleable-Job System for Timeshared Parallel Machines , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[9]  Allen B. Downey,et al.  A parallel workload model and its implications for processor allocation , 1996, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[10]  Peter Sanders,et al.  Efficient Parallel Scheduling of Malleable Tasks , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[11]  Wolfgang Schröder-Preikschat,et al.  DistRM: Distributed resource management for on-chip many-core systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[12]  Rainer Leupers,et al.  MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[13]  Mark S. Squillante,et al.  Dynamic Partitioning in Different Distributed-Memory Environments , 1996, JSSPP.

[14]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[15]  Margaret Martonosi,et al.  Adaptive parallelism in compiler‐parallelized code , 1998 .