Accelerated Machine Learning Using TensorFlow and SYCL on OpenCL Devices

Machine learning is being used in more and more artificial intelligence applications. While existing machine learning frameworks mostly support NVIDIA CUDA GPUs, there has been little research dedicated to targeting other devices through open standards such as OpenCL. In this paper, we explain how machine learning applications can harness the power of OpenCL using open standards and how, by using SYCL, TensorFlow can be extended to include customized operations running on OpenCL devices.

[1]  Ralph Potter,et al.  Kernel composition in SYCL , 2015, IWOCL.

[2]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[3]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Anelia Angelova,et al.  Pedestrian detection with a Large-Field-Of-View deep network , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Yibing Liu,et al.  OpenCL caffe: Accelerating and enabling a cross platform machine learning framework , 2016, IWOCL.

[8]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[11]  Mehdi Goli,et al.  VisionCPP: A SYCL-based Computer Vision Framework , 2016, IWOCL.

[12]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[13]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[14]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.