Multitasking workload scheduling on flexible core chip multiprocessors

While technology trends have ushered in the age of chip multiprocessors (CMP) and enabled designers to place an increasing number of cores on chip, a fundamental question is what size to make each core. Most current commercial designs are symmetric CMPs in which each core is identical and range from a relatively simple RISC pipeline to a large and complicated out-of-order x86 core. When the granularity of parallelism in the tasks matches the granularity of the processing cores, a CMP will be at its most efficient. To adjust the granularity of a core to the tasks running on it, recent research has proposed flexible-core chip multiprocessors, which typically consist of a number of small processing cores that can be aggregated to form larger logical processors. These architectures introduce a new resource allocation and scheduling problem which must determine how many logical processors should be configured, how powerful each processor should be, and where/when each task should run. This paper introduces and motivates this new scheduling problem, describes the challenges associated with it, and examines and evaluates several algorithms (amenable to implementation in an operating system) appropriate for such flexible-core CMPs. We also describe how scheduling for flexible-core architectures differs from scheduling for fixed multicore architectures, and compare the performance of flexible-core CMPs to both symmetric and asymmetric fixed-core CMPs.

[1]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[2]  Murali Annavaram,et al.  Mitigating Amdahl's Law through EPI Throttling , 2005, ISCA 2005.

[3]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS 1988.

[4]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  Dirk Grunwald,et al.  Aide de Camp: Asymmetric Dual Core Design for Power and Energy Reduction ; CU-CS-964-03 , 2003 .

[6]  John Paul Shen,et al.  Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[7]  Anant Agarwal,et al.  Versatility and VersaBench: A New Metric and a Benchmark Suite for Flexible Architectures , 2004 .

[8]  Konrad K. Lai,et al.  The Impact of Performance Asymmetry in Emerging Multicore Architectures , 2005, ISCA 2005.

[9]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[10]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[11]  Scott A. Mahlke,et al.  Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[12]  S. Winkel Optimal versus Heuristic Global Code Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[13]  Jesús Labarta,et al.  Performance-driven processor allocation , 2000, IEEE Transactions on Parallel and Distributed Systems.

[14]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[15]  Engin Ipek,et al.  Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[16]  Ashok Kumar,et al.  An 8-Core 64-Thread 64b Power-Efficient SPARC SoC , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[17]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[18]  Toshihide Ibaraki,et al.  Resource allocation problems - algorithmic approaches , 1988, MIT Press series in the foundations of computing.

[19]  Larry Rudolph,et al.  Metrics and Benchmarking for Parallel Job Scheduling , 1998, JSSPP.

[20]  Kevin Skadron,et al.  Federation: Out-of-Order Execution using Simple In-Order Cores , 2007 .

[21]  Uwe Schwiegelshohn,et al.  Parallel Job Scheduling - A Status Report , 2004, JSSPP.

[22]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.