SOL: Effortless Device Support for AI Frameworks without Source Code Changes

Modern high performance computing clusters heavily rely on accelerators to overcome the limited compute power of CPUs. These supercomputers run various applications from different domains such as simulations, numerical applications or artificial intelligence (AI). As a result, vendors need to be able to efficiently run a wide variety of workloads on their hardware.In the AI domain this is in particular exacerbated by the existance of a number of popular frameworks (e.g, PyTorch, TensorFlow, etc.) that have no common code base, and can vary in functionality. The code of these frameworks evolves quickly, making it expensive to keep up with all changes and potentially forcing developers to go through constant rounds of upstreaming.In this paper we explore how to provide hardware support in AI frameworks without changing the framework’s source code in order to minimize maintenance overhead. We introduce SOL, an AI acceleration middleware that provides a hardware abstraction layer that allows us to transparently support heterogenous hardware. As a proof of concept, we implemented SOL for PyTorch with three backends: CPUs, GPUs and vector processors.

[1]  Takuya Akiba,et al.  Chainer: A Deep Learning Framework for Accelerating the Research Cycle , 2019, KDD.

[2]  Jean-Luc Gaudiot,et al.  Enabling Embedded Inference Engine with ARM Compute Library: A Case Study , 2017, ArXiv.

[3]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[4]  Florian Schmidt,et al.  BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism , 2018, ArXiv.

[5]  Yury Gorbachev,et al.  OpenVINO Deep Learning Workbench: Comprehensive Analysis and Tuning of Neural Networks Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[6]  Chao Liu,et al.  MIOpen: An Open Source Library For Deep Learning Primitives , 2019, Proceedings of the 30th International Conference on Computer Graphics and Machine Vision (GraphiCon 2020). Part 2.

[7]  Tianqi Chen,et al.  Relay: a new IR for machine learning frameworks , 2018, MAPL@PLDI.

[8]  Paul Barham,et al.  Machine Learning Systems are Stuck in a Rut , 2019, HotOS.

[9]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[10]  Stacy Holman Jones Torch , 1999 .

[11]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[12]  Amit Agarwal,et al.  CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.

[13]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[14]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[15]  D. Scott Cyphers,et al.  Intel® nGraphTM , 2018 .

[16]  Tim Zerrell,et al.  Stripe: Tensor Compilation via the Nested Polyhedral Model , 2019, ArXiv.

[17]  Albert Cohen,et al.  Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.

[18]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).