论文信息 - UNIFIED CONVOLUTION FRAMEWORK: A COMPILER-BASED APPROACH TO SUPPORT SPARSE CONVOLUTIONS

UNIFIED CONVOLUTION FRAMEWORK: A COMPILER-BASED APPROACH TO SUPPORT SPARSE CONVOLUTIONS

ABSTRACT This paper introduces a Unified Convolution Framework (UCF) that incorporates various existing sparse convolutions in a unified abstraction. This work is in contrast to the common library-based approach that requires much engineering effort because each different sparse convolution must be implemented separately. Instead, it employs a tensor compiler approach that can flexibly explore convolutions with various program transformations; however, no compiler can currently support various sparse convolutions flexibly to our knowledge. In particular, the Tensor Algebra Compiler (TACO) can support a variety of sparse formats but cannot declare convolutions because a tensor cannot be accessed by a linear combination of index variables. We extend TACO’s Einsum language to support an affine index expression to declare a convolution. Our method is also compatible with TACO’s format and scheduling language, enabling various sparse convolution implementations to be explored. Our experimental results demonstrate that TACO-UCF achieves 1.32× and 8.3× average speedups on a filter sparse convolution and a submanifold sparse convolution, respectively, over state-of-the-art libraries on CPU. TACO-UCF on GPU outperforms the state-of-the-art GPU library on filter sparse convolution of ResNet50 by an average of 1.47× at 80% sparsity. We also demonstrate TACO-UCF outperforms on a neighbor retrieval of a submanifold sparse convolution by an average of 2.55× and 3.34× over MinkowskiEngine and TorchSparse on GPU, respectively.

J. Emer | Charith Mendis | S. Amarasinghe | Jaeyeon Won | Changwan Hong

[1] Haotian Tang,et al. TorchSparse: Efficient Point Cloud Inference Engine , 2022, MLSys.

[2] Aart J. C. Bik,et al. Compiler Support for Sparse Tensor Computations in MLIR , 2022, ACM Trans. Archit. Code Optim..

[3] Gokcen Kestor,et al. A High Performance Sparse Tensor Algebra Compiler in MLIR , 2021, 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC).

[4] Kunle Olukotun,et al. Compilation of sparse array programming models , 2021, Proc. ACM Program. Lang..

[5] Dan Alistarh,et al. Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks , 2020, ICML.

[6] V. Sze,et al. Efficient Processing of Deep Neural Networks , 2020, Synthesis Lectures on Computer Architecture.

[7] Yanzhi Wang,et al. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning , 2020, ASPLOS.

[8] Christopher W. Fletcher,et al. SparseTrain: Leveraging Dynamic Sparsity in Software for Training DNNs on General-Purpose SIMD Processors , 2019, PACT.

[9] Erich Elsen,et al. Fast Sparse ConvNets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Frédo Durand,et al. Taichi , 2019, ACM Trans. Graph..

[11] Silvio Savarese,et al. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Cyrill Stachniss,et al. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[14] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15] Shoaib Kamil,et al. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[16] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..

[17] Xuhao Chen,et al. Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs , 2018, 1802.10280.

[18] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[19] Laurens van der Maaten,et al. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[21] Yu Wang,et al. Exploring the Granularity of Sparsity in Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[24] Pradeep Dubey,et al. Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.

[25] Silvio Savarese,et al. 3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[28] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[29] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[30] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[31] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.

[32] Rasmus Pagh,et al. Cuckoo Hashing , 2001, Encyclopedia of Algorithms.