Virtual memory window for application-specific reconfigurable coprocessors

The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead

[1]  Michael D. Smith,et al.  A high-performance microarchitecture with hardware-programmable functional units , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Rudy Lauwereins,et al.  Designing an operating system for a heterogeneous reconfigurable SoC , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[3]  Ranga Vemuri,et al.  Behavioral partitioning in the synthesis of mixed analog-digital systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[4]  Monk-Ping Leong,et al.  Pilchard - A Reconfigurable Computing Platform with Memory Slot Interface , 2001, IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Francky Catthoor,et al.  Custom Memory Management Methodology , 1998, Springer US.

[6]  Paul E. Hasler,et al.  Single Transistor Learning Synapses , 1994, NIPS.

[7]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[8]  Haibo Wang,et al.  Behavioral synthesis of field programmable analog array circuits , 2002, TODE.

[9]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[10]  Christopher M. Twigg,et al.  Characteristics and programming of floating-gate pFET switches in an FPAA crossbar network , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[11]  Michael Herz,et al.  Memory addressing organization for stream-based reconfigurable computing , 2002, 9th International Conference on Electronics, Circuits and Systems.

[12]  David V. Anderson,et al.  Developing large-scale field-programmable analog arrays , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  V. De Florio,et al.  Methodology for refinement and optimization of dynamic memory management for embedded systems in multimedia applications , 2003, 2003 IEEE Workshop on Signal Processing Systems (IEEE Cat. No.03TH8682).

[14]  Neil W. Bergmann,et al.  An Interface Methodology for Retargettable FPGA Peripherals , 2003, Engineering of Reconfigurable Systems and Algorithms.

[15]  A. Grabel,et al.  A method for the determination of the transfer function of electronic circuits , 1973 .

[16]  Paolo Ienne,et al.  Virtual memory window for a portable reconfigurable cryptography coprocessor , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[18]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[19]  Alexandru Nicolau,et al.  Memory Issues in Embedded Systems-on-Chip , 1999 .

[20]  Paolo Ienne,et al.  Dynamic Prefetching in the Virtual Memory Window of Portable Reconfigurable Coprocessors , 2004, FPL.

[21]  John Wawrzynek,et al.  Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[22]  Scott Hauck,et al.  The Chimaera reconfigurable functional unit , 1997, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23]  Marco Platzner,et al.  Online scheduling for block-partitioned reconfigurable devices , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[24]  Ahmed Amine Jerraya,et al.  Automatic generation of embedded memory wrapper for multiprocessor SoC , 2002, DAC '02.

[25]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[26]  Michael Winston Dales,et al.  Managing a reconfigurable processor in a general purpose workstation environment , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[27]  P.H.W. Leong,et al.  Pilchard — a reconfigurable computing platform with memory slot interface , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[28]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: application in VLSI domain , 1997, DAC.

[29]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[30]  Paolo Ienne,et al.  Seamless hardware-software integration in reconfigurable computing systems , 2005, IEEE Design & Test of Computers.

[31]  Patrick Schaumont,et al.  Standards for system-level design: practical reality or solution in search of a question? , 2000, Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537).

[32]  Jason Cong,et al.  FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..