ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity

GPUs are widely used to accelerate deep learning with convolutional neural networks (CNNs). However, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large CNNs on GPU. This paper describes the design and implementation of out-of-core cuDNN (ooc cuDNN) library. It supports to compute CNNs exceeding GPU memory capacity using capacity of CPU memory. ooc cuDNN is an extension of cuDNN, which is high performance and popular deep learning library for GPUs. ooc cuDNN divides CNN computation based on its performance model for performance improvement. In addition, ooc cuDNN provides fused functions, which are combinations of several kernel functions to reduce extra communication costs. With ooc cuDNN, we successfully computed CNN requiring more than 60 GB memory on a single GPU with 16 GB memory. Compared with an in-core case using cuDNN, performance degradation was 13%.

[1]  Natalia Gimelshein,et al.  vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Eric P. Xing,et al.  GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[5]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[9]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[10]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[12]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[13]  H. Sebastian Seung,et al.  ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.