Dynamic Prefetching in the Virtual Memory Window of Portable Reconfigurable Coprocessors

In Reconfigurable Systems-On-Chip (RSoCs), operating sys- tems can primarily (1) manage the sharing of limited reconfigurable resources, and (2) support communication between reconfigurable ac- celerators and user applications. It has been shown in previous work that the operating system can dramatically simplify the interface to re- configurable coprocessors and isolate the programmer from all details of the hardware. A further potential of the operating system is devel- oped here: the operating system can observe accelerators at runtime and dynamically take actions which improve their execution. The strength of involving the operating system consists in achieving better perfor- mance without any information from the end user and without changes either in the coprocessor hardware design or in the software application. Specifically, this paper presents an operating system module that moni- tors reconfigurable coprocessors, predicts their future memory accesses, and performs memory prefetching accordingly; the goal is to hide com- pletely memory-to-memory communication latency. The module uses a lightweight hardware support to detect coprocessors memory access pat- terns. The effectiveness of the technique is demonstrated for two applica- tions on an embedded RSoC board running the Linux operating system. Significant speedup is achieved compared to the nonprefetching version, and the improvement is obtained in a manner completely transparent to the application programmer.

[1]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE) , 2000, FPL.

[2]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  Michael Winston Dales,et al.  Managing a reconfigurable processor in a general purpose workstation environment , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[4]  Marco Platzner,et al.  Online scheduling for block-partitioned reconfigurable devices , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[5]  Paolo Ienne,et al.  Virtual memory window for application-specific reconfigurable coprocessors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[6]  Paolo Ienne,et al.  Virtual memory window for a portable reconfigurable cryptography coprocessor , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[7]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[8]  Josep Torrellas,et al.  Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[9]  M. Frans Kaashoek,et al.  Software prefetching and caching for translation lookaside buffers , 1994, OSDI '94.

[10]  Andreas Koch,et al.  Memory Access Schemes for Configurable Processors , 2000, FPL.

[11]  John Wawrzynek,et al.  Stream Computations Organized for Reconfigurable Execution (SCORE): Introduction and Tutorial , 2000 .

[12]  Per Stenström,et al.  Recency-based TLB preloading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Rudy Lauwereins,et al.  Designing an operating system for a heterogeneous reconfigurable SoC , 2003, Proceedings International Parallel and Distributed Processing Symposium.