An online scheduler for hardware accelerators on general-purpose operating systems

This paper presents an online scheduling algorithm for hardware accelerators and its implementation on the NetBSD operating system. The scheduler uses the current performance characteristics of the accelerators to select which accelerators to load and unload. The evaluation on a number of workloads shows that the scheduler is typically within 20% of the optimal schedule computed offline. The hardware support consists of simple costbenefit indicators, usable for any online scheduling algorithm. The NetBSD modifications consist primarily in loadable kernel modules, with minimal changes to the operating system itself. The measured overhead is negligible when accelerators are not in use, and otherwise scales linearly by a small constant with the number of active accelerators.

[1]  Frank Vahid,et al.  Making good points: application-specific pareto-point generation for design space exploration using statistical methods , 2009, FPGA '09.

[2]  Nikil D. Dutt,et al.  EXPRESSION: a language for architecture exploration through compiler/simulator retargetability , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[3]  Hong Lu,et al.  Automatic Processor Customization for Zero-Overhead Online Software Verification , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Frank Vahid,et al.  Dynamic tuning of configurable architectures: the AWW online algorithm , 2008, CODES+ISSS '08.

[5]  Warren J. Gross,et al.  FPGA particle graphics hardware , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[6]  Bharat Sukhwani,et al.  Extensible On-Chip Peripherals , 2008, 2008 Symposium on Application Specific Processors.

[7]  Alessandro Forin,et al.  Path-based scheduling in a hardware compiler , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[8]  Giovanni Agosta,et al.  Multi-objective co-exploration of source code transformations and design space architectures for low-power embedded systems , 2004, SAC '04.

[9]  James C. Browne,et al.  Trace-driven modeling and analysis of CPU scheduling in a multiprogramming system , 1972, Commun. ACM.

[10]  Lyle A. McGeoch,et al.  Competitive Algorithms for Server Problems , 1990, J. Algorithms.

[11]  Philip Heng Wai Leong,et al.  A massively parallel RC4 key search engine , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[12]  Neil Pittman,et al.  eMIPS, A Dynamically Extensible Processor , 2006 .

[13]  Neil Pittman,et al.  A Security Model for Reconfigurable Microcomputers , 2008 .

[14]  Alessandro Forin,et al.  Minimizing partial reconfiguration overhead with fully streaming DMA engines and intelligent ICAP controller (abstract only) , 2010, FPGA '10.

[15]  Anna R. Karlin,et al.  Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[16]  Alessandro Forin,et al.  Where's the Beef? Why FPGAs Are So Fast , 2008 .

[17]  Gregory D. Peterson,et al.  Analytical modeling of high performance reconfigurable computers: prediction and analysis of system performance , 2003 .

[18]  Katherine Compton,et al.  An execution environment for reconfigurable computing , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[19]  Luca Benini,et al.  Source code transformation based on software cost analysis , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[20]  Allan Borodin,et al.  An optimal online algorithm for metrical task systems , 1987, STOC.

[21]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[22]  Alessandro Forin,et al.  Giano: The Two-Headed System Simulator , 2006 .

[23]  Alessandro Forin,et al.  Exploiting partial reconfiguration for flexible software debugging , 2008, 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation.

[24]  Wayne Luk,et al.  Exploiting program branch probabilities in hardware compilation , 2004, IEEE Transactions on Computers.

[25]  David L. Black,et al.  The duality of memory and communication in the implementation of a multiprocessor operating system , 1987, SOSP '87.

[26]  Chen Chang,et al.  BEE3: Revitalizing Computer Architecture Research , 2009 .

[27]  Wayne Luk,et al.  Compilation and management of phase-optimized reconfigurable systems , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[28]  Zheng Wang,et al.  Using latency to evaluate interactive system performance , 1996, OSDI '96.

[29]  Allan Borodin,et al.  An optimal on-line algorithm for metrical task system , 1992, JACM.

[30]  Ed F. Deprettere,et al.  A Methodology for Architecture Exploration of Heterogeneous Signal Processing Systems , 2001, J. VLSI Signal Process..

[31]  Xiaobo Sharon Hu,et al.  An FPGA Solution for Radiation Dose Calculation , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.