Unifying software and hardware of multithreaded reconfigurable applications within operating system processes

Novel reconfigurable System-on-Chip (SoC) devices offer combining software with application-specific hardware accelerators to speed up applications. However, by mixing user software and user hardware, principal programming abstractions and system-software commodities are usually lost, since hardware accelerators (1) do not have execution context —it is typically the programmer who is supposed to provide it, for each accelerator, (2) do not have virtual memory abstraction —it is again programmer who shall communicate data from user software space to user hardware, even if it is usually burdensome (or sometimes impossible!), (3) cannot invoke system services (e.g., to allocate memory, open files, communicate), and (4) are not easily portable —they depend mostly on system-level interfacing, although they logically belong to the application level. We introduce a unified Operating System (OS) process for codesigned reconfigurable applications that provides (1) unified memory abstraction for software and hardware application parts, (2) execution transfers from software to hardware and vice versa, thus enabling hardware accelerators to use systems services and callback other software and hardware functions, and (3) multithreaded execution of multiple software and hardware threads. The unified OS process ensures portability of codesigned applications, by providing standardised means of interfacing. Having just-another abstraction layer usually affects performance: we show that the runtime optimisations in the system layer supporting the unified OS process can minimise the performance loss and even outperform typical approaches. The unified OS process also fosters unrestricted automated synthesis of software to hardware, thus allowing unlimited migration of application components. We demonstrate the advantages of the unified OS process in practice, for Linux systems running on Xilinx Virtex-II Pro and Altera Excalibur reconfigurable devices.

[1]  Michael D. Smith,et al.  A high-performance microarchitecture with hardware-programmable functional units , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Francky Catthoor,et al.  Custom Memory Management Methodology , 1998, Springer US.

[3]  Rudy Lauwereins,et al.  Designing an operating system for a heterogeneous reconfigurable SoC , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  Paolo Ienne,et al.  Seamless hardware-software integration in reconfigurable computing systems , 2005, IEEE Design & Test of Computers.

[5]  M. Frans Kaashoek,et al.  Software prefetching and caching for translation lookaside buffers , 1994, OSDI '94.

[6]  Gordon J. Brebner,et al.  A Virtual Hardware Operating System for the Xilinx XC6200 , 1996, FPL.

[7]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  V. De Florio,et al.  Methodology for refinement and optimization of dynamic memory management for embedded systems in multimedia applications , 2003, 2003 IEEE Workshop on Signal Processing Systems (IEEE Cat. No.03TH8682).

[9]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[10]  Luca Benini,et al.  Improving Java performance using dynamic method migration on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11]  Paolo Ienne,et al.  Enabling unrestricted automated synthesis of portable hardware accelerators for virtual machines , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[12]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[13]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[14]  Csaba Andras Moritz,et al.  Parallelizing applications into silicon , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[15]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[16]  Peter J. Ashenden,et al.  Programming models for hybrid FPGA-cpu computational components: a missing link , 2004, IEEE Micro.

[17]  Thorsten Grotker,et al.  System Design with SystemC , 2002 .

[18]  Paolo Ienne,et al.  Virtual memory window for a portable reconfigurable cryptography coprocessor , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[19]  Kai Li,et al.  IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.

[20]  Paolo Ienne,et al.  Programming transparency and portable hardware interfacing: towards general-purpose reconfigurable computing , 2004 .

[21]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[22]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[23]  Walter Stechele,et al.  A coprocessor for accelerating visual information processing , 2005, Design, Automation and Test in Europe.

[24]  Giovanni De Micheli,et al.  Synthesis of hardware models in C with pointers and complex data structures , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[25]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[26]  Donald Soderman,et al.  Implementing C algorithms in reconfigurable hardware using C2Verilog , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[27]  B. Flachs,et al.  A streaming processing unit for a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[28]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[29]  R. Murray The Teaching Assistant , 1996 .

[30]  Paolo Ienne,et al.  Automatic topology-based identification of instruction-set extensions for embedded processors , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[31]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip , 1999 .

[32]  John E. Savage,et al.  Models of computation - exploring the power of computing , 1998 .

[33]  P.H.W. Leong,et al.  Pilchard — a reconfigurable computing platform with memory slot interface , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[34]  Paolo Faraboschi,et al.  Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools , 2004 .

[35]  Katherine Compton,et al.  An execution environment for reconfigurable computing , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[36]  Stephen A. Edwards,et al.  The challenges of hardware synthesis from C-like languages , 2005, Design, Automation and Test in Europe.

[37]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[38]  Shubhendu S. Mukherjee,et al.  Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[39]  Luca Benini,et al.  Improving Java performance using dynamic method migration on FPGAs , 2005, Int. J. Embed. Syst..

[40]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[41]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE): Introduction and Tutorial , 2000 .

[42]  Michael J. Flynn,et al.  Computer Architecture: Pipelined and Parallel Processor Design , 1995 .

[43]  Michael Herz,et al.  Memory addressing organization for stream-based reconfigurable computing , 2002, 9th International Conference on Electronics, Circuits and Systems.

[44]  David R. Galloway The Transmogrifier C hardware description language and compiler for FPGAs , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[45]  Amer Baghdadi,et al.  Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[46]  John Wawrzynek,et al.  A Streaming Multi-Threaded Model , 2001 .

[47]  Andreas Koch,et al.  Memory Access Schemes for Configurable Processors , 2000, FPL.

[48]  Seth Copen Goldstein,et al.  Spatial computation , 2004, ASPLOS XI.

[49]  Uresh K. Vahalia UNIX Internals: The New Frontiers , 1995 .

[50]  Stamatis Vassiliadis,et al.  The MOLEN polymorphic processor , 2004, IEEE Transactions on Computers.

[51]  Marco Platzner,et al.  Online scheduling for block-partitioned reconfigurable devices , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[52]  Kai Li,et al.  Multiprocessor Cache Coherence Based on Virtual Memory Support , 1995, J. Parallel Distributed Comput..

[53]  Ludovic Righetti,et al.  Operating system support for interface virtualisation of reconfigurable coprocessors , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[54]  Klaus Buchenrieder,et al.  Java driven codesign and prototyping of networked embedded systems , 1999, DAC '99.

[55]  Gordon J. Brebner,et al.  The swappable logic unit: a paradigm for virtual hardware , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[56]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[57]  Jean-Luc Beuchat Modular multiplication for FPGA implementation of the IDEA block cipher , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[58]  Evangelos P. Markatos,et al.  User-level DMA without operating system kernel modification , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[59]  Robert Love,et al.  Linux Kernel Development , 2003 .

[60]  Gordon J. Brebner,et al.  Hyper-programmable architectures for adaptable networked systems , 2004 .

[61]  C.L. Mitchell,et al.  A workbench for computer architects , 1988, IEEE Design & Test of Computers.

[62]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[63]  Mark D. Hill,et al.  Making Network Interfaces Less Peripheral , 1998, Computer.

[64]  Wayne H. Wolf,et al.  Computers as components - principles of embedded computing system design , 2005 .

[65]  Chantal Ykman-Couvreur,et al.  Memory management for embedded network applications , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[66]  Edusmildo Orozco,et al.  Reconfigurable Computing. Accelerating Computation with Field-Programmable Gate Arrays , 2007, Scalable Comput. Pract. Exp..

[67]  Koichi Nishida,et al.  Hardware synthesis with the Bach system , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[68]  Gordon J. Brebner,et al.  Hyper-programmable architectures for adaptable networked systems , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[69]  Bradford Nichols,et al.  Pthreads programming - a POSIX standard for better multiprocessing , 1996 .

[70]  Monk-Ping Leong,et al.  Pilchard - A Reconfigurable Computing Platform with Memory Slot Interface , 2001, IEEE Symposium on Field-Programmable Custom Computing Machines.

[71]  William Stallings,et al.  Operating Systems: Internals and Design Principles , 1991 .

[72]  Jean Bacon,et al.  Operating Systems - Concurrent and Distributed Software Design , 2003, International computer science series.

[73]  Willy Zwaenepoel,et al.  Adaptive software cache management for distributed shared memory architectures , 1990, ISCA '90.

[74]  Tom Shanley,et al.  Pentium Processor System Architecture , 1993 .

[75]  Paolo Ienne,et al.  Dynamic Prefetching in the Virtual Memory Window of Portable Reconfigurable Coprocessors , 2004, FPL.

[76]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[77]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[78]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[79]  Veljko M. Milutinovic,et al.  Distributed shared memory: concepts and systems , 1997, IEEE Parallel Distributed Technol. Syst. Appl..

[80]  Jian Wang An FPGA Based Software/Hardware Codesign for Real Time Video Processing : A Video Interface Software and Contrast Enhancement Hardware Codesign Implementation using Xilinx Virtex II Pro FPGA , 2006 .

[81]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[82]  Michael Winston Dales,et al.  Managing a reconfigurable processor in a general purpose workstation environment , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[83]  Giovanni De Micheli Hardware synthesis from C/C++ models , 1999, DATE '99.

[84]  David Pellerin,et al.  Practical FPGA programming in C , 2005 .

[85]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[86]  John E. Savage The Performance of Multilective VLSI Algorithms , 1984, J. Comput. Syst. Sci..

[87]  Frank Vahid,et al.  Dynamic FPGA routing for just-in-time FPGA compilation , 2004, Proceedings. 41st Design Automation Conference, 2004..

[88]  Krzysztof,et al.  INTERNATIONAL ORGANISATION FOR STANDARDISATION , 2006 .

[89]  Cathy May,et al.  The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .

[90]  Neil W. Bergmann,et al.  An Interface Methodology for Retargettable FPGA Peripherals , 2003, Engineering of Reconfigurable Systems and Algorithms.

[91]  Albert Y. Zomaya Handbook of Nature-Inspired and Innovative Computing - Integrating Classical Models with Emerging Technologies , 2006 .

[92]  Paolo Ienne,et al.  On the Limits of Processor Specialisation by Mapping Dataflow Sections on Ad-hoc Functional Units , 2001 .

[93]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[94]  Ahmed Amine Jerraya,et al.  Automatic generation of embedded memory wrapper for multiprocessor SoC , 2002, DAC '02.

[95]  Mark D. Hill,et al.  Address translation mechanisms in network interfaces , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[96]  Josep Torrellas,et al.  Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[97]  Giovanni De Micheli,et al.  Readings in hardware / software co-design , 2001 .

[98]  Patrick Schaumont,et al.  Standards for system-level design: practical reality or solution in search of a question? , 2000, Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537).

[99]  Jim Stevens,et al.  Enabling a Uniform Programming Model Across the Software/Hardware Boundary , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[100]  Paolo Ienne,et al.  Programming transparency and portable hardware interfacing: towards general-purpose reconfigurable computing , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[101]  Richard L. Sites,et al.  Alpha AXP architecture reference manual , 1995 .

[102]  David Seal,et al.  ARM Architecture Reference Manual , 2001 .

[103]  T. S. West New Frontiers , 1968, Nature.

[104]  André DeHon,et al.  The Density Advantage of Configurable Computing , 2000, Computer.

[105]  Gregory A. Baxes,et al.  Digital image processing - principles and applications , 1994 .

[106]  Andreas Moshovos,et al.  CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit , 2000, ISCA '00.

[107]  Harvey F. Silverman,et al.  Processor reconfiguration through instruction-set metamorphosis , 1993, Computer.