Time sharing of Runtime Coarse-Grain Reconfigurable Architectures processing elements in multi-process systems

This paper presents a method to time share the Processing Elements (PEs) of Runtime Coarse Grain Reconfigurable Architectures (CGRA) among multiple processes being executed concurrently onto the same CGRA. Runtime CGRA architectures time-multiplex the data path, creating a set of contexts for each state. These contexts configure the PEs and the routing resources of the CGRA and are typically loaded every clock cycle. The target architecture in this work is a commercial CGRA IP which is embedded as an IP into complex SoCs. Our proposed method analyzes the PE utilization in each context for multi-process systems running concurrently onto the same CGRA and time shares unused PEs assigned to one process with the other processes running in parallel. Our method reduces the total PE usage and hence the size of the CGRA IP and therefore the cost of the SoC. Results show that our method is extremely efficient and can reduce the PE utilization by up to 20% and on average by 14% and is only 2% worse than the optimal solution, while being much faster.

[1]  Benjamin Carrión Schäfer,et al.  S2CBench: Synthesizable SystemC Benchmark Suite for High-Level Synthesis , 2014, IEEE Embedded Systems Letters.

[2]  Koichiro Furuta,et al.  Optimizing time and space multiplexed computation in a dynamically reconfigurable processor , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[3]  Hideharu Amano,et al.  A Survey on Dynamically Reconfigurable Processors , 2006, IEICE Trans. Commun..

[4]  Reinaldo A. Bergamaschi,et al.  Generalized resource sharing , 1997, ICCAD 1997.

[5]  Vikram K. Narayana,et al.  Space and Time Sharing of Reconfigurable Hardware for Accelerated Parallel Processing , 2010, ARC.

[6]  Aviral Shrivastava,et al.  REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs) , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[7]  Mingjie Lin,et al.  k-server optimal task scheduling problem with convex cost function , 2005, Third International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt'05).

[8]  Zhiyuan Li,et al.  Configuration Caching Techniques for FPGA , 2000 .

[9]  Zhiyuan Li,et al.  Configuration caching management techniques for reconfigurable computing , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[10]  Jason Helge Anderson,et al.  Impact of FPGA architecture on resource sharing in high-level synthesis , 2012, FPGA '12.

[11]  Zhiyuan Li,et al.  Configuration prefetching techniques for partial reconfigurable coprocessor with relocation and defragmentation , 2002, FPGA '02.

[12]  Reiner W. Hartenstein,et al.  A decade of reconfigurable computing: a visionary retrospective , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[13]  Li Jing,et al.  High-Level Synthesis Challenges and Solutions for a Dynamically Reconfigurable Processor , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.