A host-accelerator communication architecture design for efficient binary acceleration

Binary acceleration of a kernel on an accelerator may have a data duplication problem. Some data in an address range may be copied into the local memory of the accelerator incurring data copy overhead as well as a coherence problem. Configurable Range Memory (CRM) is a memory shared by the host processor and the accelerator, which can specify its own address range such that the data within the range can be directly loaded into it. However, the memory may need to be carefully designed considering the memory access patterns of the accelerator not to incur any unnecessary overhead. This work presents a new CRM architecture and shows how it improves the system performance with a novel Coarse-Grained Reconfigurable Array (CGRA) architecture.

[1]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[2]  Kiyoung Choi,et al.  Binary acceleration using coarse-grained reconfigurable architecture , 2010, CARN.

[3]  Luigi Carro,et al.  Dynamic reconfiguration with binary translation: breaking the ILP barrier with software compatibility , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[4]  Bjorn De Sutter,et al.  Coarse-Grained Reconfigurable Array Architectures , 2010, Handbook of Signal Processing Systems.

[5]  Kiyoung Choi,et al.  CRM: Configurable Range Memory for Fast Reconfigurable Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[6]  Kiyoung Choi,et al.  Techniques for improving coarse-grained reconfigurable architectures , 2011, 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS).

[7]  Rudy Lauwereins,et al.  A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing , 2008, IEEE Micro.

[8]  Kiyoung Choi,et al.  Memory-Centric Communication Architecture for Reconfigurable Computing , 2010, ARC.

[9]  Mamoru Nakanishi,et al.  10G/1G dual-rate EPON OLT LSI with dual encryption modes alternated using DBA-information-based algorithm control , 2011, 2011 International SoC Design Conference.

[10]  Kiyoung Choi Coarse-Grained Reconfigurable Array: Architecture and Application Mapping , 2011, IPSJ Trans. Syst. LSI Des. Methodol..

[11]  Jyrki Leskela,et al.  OpenCL embedded profile prototype in mobile device , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[12]  Kiyoung Choi,et al.  FloRA: Coarse-grained reconfigurable architecture with floating-point operation capability , 2009, 2009 International Conference on Field-Programmable Technology.