Runtime Resource Allocation for Software Pipelines

Efficiently allocating the computational resources of many-core systems is one of the most prominent challenges, especially when resource requirements may vary unpredictably at runtime. This is even more challenging when facing unreliable cores—a scenario that becomes common as the number of cores increases and integration sizes shrink. To address this challenge, this article presents an optimal method for the allocation of the resources to software-pipelined applications. Here we show how runtime observations of the resource requirements of tasks can be used to adapt resource allocations. Furthermore, we show how the optimum can be traded for a high degree of scalability by clustering applications in a distributed, hierarchical manner. To diminish the negative effects of unreliable cores, this article shows how self-organization can effectively restore the integrity of such a hierarchy when it is corrupted by a failing core. Experiments on Intel’s 48-core Single-Chip Cloud Computer and in a many-core simulator show that a significant improvement in system throughput can be achieved over the current state of the art.

[1]  Jörg Henkel,et al.  Optimizations for configuring and mapping software pipelines in many core systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[3]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[4]  Larry Rudolph,et al.  Proceedings of the Job Scheduling Strategies for Parallel Processing , 1997 .

[5]  Coniferous softwood GENERAL TERMS , 2003 .

[6]  Andrea Acquaviva,et al.  Assessing Task Migration Impact on Embedded Soft Real-Time Streaming Multimedia Applications , 2008, EURASIP J. Embed. Syst..

[7]  Kevin Klues,et al.  Processes and Resource Management in a Scalable Many-core OS ∗ , 2010 .

[8]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[9]  Julie A. McCann,et al.  A survey of autonomic computing—degrees, models, and applications , 2008, CSUR.

[10]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[11]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Davide Bertozzi,et al.  Supporting Task Migration in Multi-Processor Systems-on-Chip: A Feasibility Study , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[13]  Sahin Albayrak,et al.  Mobility-based Runtime Load Balancing in Multi-Agent Systems , 2006, SEKE.

[14]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Narayanan Vijaykrishnan,et al.  Reliability concerns in embedded system designs , 2006, Computer.

[16]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[17]  Théodore Marescaux,et al.  Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles , 2005, Design, Automation and Test in Europe.

[18]  Brian T. Lewis,et al.  Thread Scheduling for Multi-Core Platforms , 2007, HotOS.

[19]  Bradford Nichols,et al.  Pthreads programming - a POSIX standard for better multiprocessing , 1996 .

[20]  Francisco J. Cazorla,et al.  Optimal task assignment in multithreaded processors: a statistical approach , 2012, ASPLOS XVII.

[21]  Allen B. Downey,et al.  A parallel workload model and its implications for processor allocation , 1996, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[22]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[23]  Lothar Thiele,et al.  Scenario-based design flow for mapping streaming applications onto on-chip many-core systems , 2012, CASES '12.

[24]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[25]  Philip S. Yu,et al.  Approximate algorithms scheduling parallelizable tasks , 1992, SPAA '92.

[26]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[27]  Rainer Leupers,et al.  MAPS: An integrated framework for MPSoC application parallelization , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[28]  Karam S. Chatha,et al.  Dynamic scheduling of stream programs on embedded multi-core processors , 2012, CODES+ISSS '12.

[29]  Fernando Gehm Moraes,et al.  Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs , 2007, 18th IEEE/IFIP International Workshop on Rapid System Prototyping (RSP '07).

[30]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  B GibbonsPhillip ACM transactions on parallel computing , 2014 .

[32]  Jörg Henkel,et al.  ADAM: Run-time agent-based distributed application mapping for on-chip communication , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[33]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[34]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[35]  Jacques M. Bahi,et al.  Dynamic load balancing and efficient load estimators for asynchronous iterative algorithms , 2005, IEEE Transactions on Parallel and Distributed Systems.

[36]  Jörg Henkel,et al.  CARAT: Context-aware runtime adaptive task migration for multi core architectures , 2011, 2011 Design, Automation & Test in Europe.

[37]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[38]  Li Shang,et al.  Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[39]  Karam S. Chatha,et al.  Unrolling and retiming of stream applications onto embedded multicore processors , 2012, DAC Design Automation Conference 2012.

[40]  Soonhoi Ha,et al.  Executing synchronous dataflow graphs on a SPM-based multicore architecture , 2012, DAC Design Automation Conference 2012.

[41]  John P. Lehoczky,et al.  Partitioned Fixed-Priority Preemptive Scheduling for Multi-core Processors , 2009, 2009 21st Euromicro Conference on Real-Time Systems.

[42]  Paul Feautrier,et al.  Polyhedron Model , 2011, Encyclopedia of Parallel Computing.

[43]  Michael Hitchens,et al.  A new process migration algorithm , 1997, OPSR.

[44]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[45]  Todor Stefanov,et al.  Managing latency in embedded streaming applications under hard-real-time scheduling , 2012, CODES+ISSS '12.

[46]  Rainer Leupers,et al.  Communication-aware mapping of KPN applications onto heterogeneous MPSoCs , 2012, DAC Design Automation Conference 2012.

[47]  Wolfgang Schröder-Preikschat,et al.  DistRM: Distributed resource management for on-chip many-core systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[48]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[49]  Jörg Henkel,et al.  Pipelets: Self-organizing software Pipelines for many-core architectures , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[50]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[51]  Radu Marculescu,et al.  User-Aware Dynamic Task Allocation in Networks-on-Chip , 2008, 2008 Design, Automation and Test in Europe.

[52]  Peter Kulchyski and , 2015 .