Shared memory heterogeneous computation on PCIe-supported platforms

Domain-disparity between CPU and Hardware Accelerators(HA) leads to CPU under-utilization and inter-domain data copy overheads. By exposing HA memory to OS and host MMU, these overheads can be eliminated. In this paper, we present a shared virtual memory real system design for PCIe-based HAs to enable parallel heterogeneous execution in CPU and HAs without driver overheads. We extend Linux with a custom memory manager and scheduler to manage HA memory and application-cores respectively. Our FPGA-based multi-application logic design supports simultaneous execution of multiple heterogeneous applications. We show the advantages of heterogeneous execution and analyze how our design reduces OS overhead.

[1]  Steven S. Lumetta,et al.  CUBA: an architecture for efficient CPU/co-processor data communication , 2008, ICS '08.

[2]  Gernot Heiser,et al.  User-Level Device Drivers: Achieved Performance , 2005, Journal of Computer Science and Technology.

[3]  Hong Jiang,et al.  Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Laxmi N. Bhuyan,et al.  A new server I/O architecture for high speed networks , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  Yousef A. Khalidi,et al.  An Efficient Zero-Copy I/O Framework for UNIX , 1995 .

[6]  Omesh Tickoo,et al.  HiPPAI: High Performance Portable Accelerator Interface for SoCs , 2009, 2009 International Conference on High Performance Computing (HiPC).

[7]  Bixia Zheng,et al.  Twin Peaks: A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Milind Girkar,et al.  EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system , 2007, PLDI '07.

[9]  Steven S. Lumetta,et al.  CIGAR: Application Partitioning for a CPU/Coprocessor Architecture , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).