Performance Characterisation and Simulation of Intel's Integrated GPU Architecture

Integrated GPUs (iGPUs) are ubiquitous in today's client devices such as laptops and desktops. Examples include Intel's HD or Iris Graphics and AMD's APUs. An iGPU resides on the same chip as the CPU, which is in contrast to a conventional discrete GPU that would typically be connected over the PCI-E bus. Much like discrete GPUs, iGPUs are also capable of general purpose computation in addition to traditional graphics roles. Further, iGPUs have some interesting differences compared to traditional GPUs such as a cache-coherent memory hierarchy and a shared last level cache with the CPU. Despite their wide spread use, they are not studied very extensively. To the best of our knowledge, this paper introduces the first open source trace generation and microarchitectural simulation framework for Intel's integrated GPUs. We characterise the performance of Intel's Skylake and Kabylake GPUs through detailed microbenchmarks, and use the performance evaluations to guide our models and validate the simulator.

[1]  Wu-chun Feng,et al.  On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[2]  Harish Patil,et al.  Fast Computational GPU Design with GT-Pin , 2015, 2015 IEEE International Symposium on Workload Characterization.

[3]  Xiangyu Li,et al.  Hetero-mark, a benchmark suite for CPU-GPU collaborative computing , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Karsten Schwan,et al.  A framework for dynamically instrumenting GPU compute applications within GPU Ocelot , 2011, GPGPU-4.

[5]  Xinxin Mei,et al.  Benchmarking the Memory Hierarchy of Modern GPUs , 2014, NPC.

[6]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[8]  Carlos González,et al.  ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  David A. Wood,et al.  gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.

[10]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[11]  David W. Nellans,et al.  Flexible software profiling of GPU architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[12]  Vijay Janapa Reddi,et al.  PIN: a binary instrumentation tool for computer architecture research and education , 2004, WCAE '04.

[13]  Eduard Ayguadé,et al.  Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[15]  Xiaoming Li,et al.  A Micro-benchmark Suite for AMD GPUs , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[16]  David A. Wood,et al.  A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).