Reducing Inter-Application Interferences in Integrated CPU-GPU Heterogeneous Architecture

Current heterogeneous CPU-GPU architectures integrate general purpose CPUs and highly thread-level parallelized GPUs (Graphic Processing Units) in the same die. The contention in shared resources between CPU and GPU, such as the last level cache (LLC), interconnection network and DRAM, may degrade both CPU and GPU performance. Our experimental results show that GPU applications tend to have much more power than CPU applications to compete for the shared resources in LLC and on-chip network, and therefore make CPU suffer from more performance loss. To reduce the GPU’s negative impact on CPU performance, we propose a simple yet effective method based on probability to control the LLC replacement policy for reducing the CPU’s inter-core conflict misses caused by GPU without significantly impacting GPU performance. In addition, we develop two strategies to combine the probability based method for the LLC and an existing technique called virtual channel partition (VCP) for the interconnection network to further improve the CPU performance. The first strategy statically uses an empirically pre-determined probability value associated with VCP, which can improve the CPU performance by 26% on average, but degrades GPU performance by 5%. The second strategy uses a sampling method to monitor the network congestion and dynamically adjust the probability value used, which can improve the CPU performance by 24%, and only have 1 or 2% performance overhead on GPU applications.

[1]  Kevin Kai-Wei Chang,et al.  Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  Hyesoon Kim,et al.  TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[3]  Antonia Zhai,et al.  Managing shared last-level cache in a heterogeneous multicore processor , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[4]  Sudhakar Yalamanchili,et al.  Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures , 2013, ACM Trans. Design Autom. Electr. Syst..

[5]  Mahmut T. Kandemir,et al.  Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.