Breaking the last line of performance border

The Compute Library for Deep Neural Networks [1] (clDNN) is an open source performance library for Deep Learning (DL) applications intended for acceleration of DL inference on Intel® Processor Graphics. Released in 2017, works in tight cooperation with Neo [2], which is the open source OpenCL [3] driver for Intel® Processor Graphics. Together they provide a complete open source software stack for Deep Learning applications. Achieving high performance is usually a long and complex process. The closer workload is to device's theoretical peak performance, the more difficult is to take the next step. Breaking the last line of performance border is the most difficult one. During development of clDNN libraries, performance problems were discovered on a daily basis. This document will identify some techniques that were used to optimize those performance problems and achieve top performance.

[1]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).