Operating systems should manage accelerators

The inexorable demand for computing power has lead to increasing interest in accelerator-based designs. An accelerator is specialized hardware unit that can perform a set of tasks with much higher performance or power efficiency than a general-purpose CPU. They may be embedded in the pipeline as a functional unit, as in SIMD instructions, or attached to the system as a separate device, as in a cryptographic co-processor. Current operating systems provide little support for accelerators: whether integrated into a processor or attached as a device, they are treated as CPU or a device and given no additional consideration. However, future processors may have designs that require more management by the operating system. For example, heterogeneous processors may only provision some cores with accelerators, and IBM's wire-speed processor allows user-mode code to launch computations on a shared accelerator without kernel involvement. In such systems, the OS can improve performance by allocating accelerator resources and scheduling access to the accelerator as it does for memory and CPU time. In this paper, we discuss the challenges presented by adopting accelerators as an execution resource managed by the operating system. We also present the initial design of our system, which provides flexible control over where and when code executes and can apply power and performance policies. It presents a simple software interface that can leverage new hardware interfaces as well as sharing of specialized units in a heterogeneous system.

[1]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[2]  King Ngi Ngan,et al.  Implementation of H.264 on Mobile Device , 2007, IEEE Transactions on Consumer Electronics.

[3]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4]  Andrew A. Chien,et al.  10x10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency , 2011, ICCS.

[5]  H. Franke,et al.  Introduction to the wire-speed processor and architecture , 2010, IBM J. Res. Dev..

[6]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[7]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[8]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[9]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[10]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[11]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[12]  Robert H. Dennard,et al.  Design of ion-implanted MOSFET's with very small physical dimensions , 2007 .

[13]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[14]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[15]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[16]  Mark Silberstein,et al.  PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[17]  Andrew Birrell,et al.  Implementing remote procedure calls , 1984, TOCS.

[18]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.