Optimal DNN primitive selection with partitioned boolean quadratic programming

Deep Neural Networks (DNNs) require very large amounts of computation, and many different algorithms have been proposed to implement their most expensive layers, each of which has a large number of variants with different trade-offs of parallelism, locality, memory footprint, and execution time. In addition, specific algorithms operate much more efficiently on specialized data layouts. We state the problem of optimal primitive selection in the presence of data layout transformations, and show that it is NP-hard by demonstrating an embedding in the Partitioned Boolean Quadratic Assignment problem (PBQP). We propose an analytic solution via a PBQP solver, and evaluate our approach experimentally by optimizing several popular DNNs using a library of more than 70 DNN primitives, on an embedded platform and a general purpose platform. We show experimentally that significant gains are possible versus the state of the art vendor libraries by using a principled analytic solution to the problem of primitive selection in the presence of data layout transformations.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Bernhard Scholz,et al.  Register allocation for irregular architectures , 2002, LCTES/SCOPES '02.

[3]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .

[4]  David Gregg,et al.  Parallel Multi Channel convolution using General Matrix Multiplication , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hai Liu,et al.  Latte: a language, compiler, and runtime for elegant and efficient deep neural networks , 2016, PLDI.

[7]  Matthew W. Moskewicz,et al.  Boda: A Holistic Approach for Implementing Neural Network Computations , 2017, Conf. Computing Frontiers.

[8]  Peter Kulchyski and , 2015 .

[9]  Bernhard Scholz,et al.  Code Instruction Selection Based on SSA-Graphs , 2003, SCOPES.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  David Gregg,et al.  Low-memory GEMM-based convolution algorithms for deep neural networks , 2017, ArXiv.

[13]  Richard E. Blahut,et al.  Fast Algorithms for Signal Processing: Acknowledgments , 2010 .

[14]  T. Koopmans,et al.  Assignment Problems and the Location of Economic Activities , 1957 .

[15]  Franklin T. Luk,et al.  Fast Algorithms for Signal Processing , 1990 .

[16]  Bernhard Scholz,et al.  Nearly Optimal Register Allocation with PBQP , 2006, JMLC.