论文信息 - Novel approaches for GPU performance analysis

Novel approaches for GPU performance analysis

Most modern embedded GPU architectures use a concept called deferred rendering - a rendering job submitted to the GPU gets scheduled at a future point in time. When a graphics application issues a rendering API call (e.g. OpenGLES® call), the graphics driver running on the CPU, stores the state necessary for that call, but doesn't execute it on the GPU immediately. The CPU consumes subsequent API calls to build a rendering job for the GPU. When the application wants to display the result of rendering on a window (eg SwapBuffers), the CPU submits the constructed job to the GPU. This architecture is especially suited for an embedded GPU as it reduces communication and bandwidth between the CPU and GPU. Once the job has been submitted to the GPU, the CPU is free to work on preparing the next frame. It is important to ensure that different processing units CPU and GPU) are kept busy running in parallel. An application that consumes a lot of time doing CPU computation, will starve the GPU and vice-versa. Understanding the relationship between the CPU and GPU is vital for developers who want to efficiently utilize the GPU. Timeline charts capture the amount of time a processing unit is busy. A timeline chart in the most basic form is a binary chart that indicates activity on a processing unit over time. This presentation discusses the state-of-the art approaches for capturing timeline and then discusses a different approach that moves both capture and visualization to the target device.

Karthik Hariharakrishnan