论文信息 - Experiments and optimizations for TVM on RISC-V Architectures with P Extension

Experiments and optimizations for TVM on RISC-V Architectures with P Extension

TVM is a AI compiler supports both graph-level and operator-level optimization on machine learning models. It provides an optimizable flow to deploy on diverse target devices. By exploiting TVM schedule, we can optimize the codegen behavior for our RISC-V architecture. Since RISCV is configurable with the selection on different extension, we can enable multiple extensions depends on the application scene. In our work, we present the flow for enabling and optimizing the RISC-V P extension toward QNN models from TVM. With support from LLVM and a customized deep learning runtime (DLR), we verified our work on both FLOAT32 and prequantized models from Tensorflow Lite. Experiments shows that comparing with FLOAT32 models, our work can achieve 2.7-7.0 times of performance improvement with regard to total instruction count at runtime for pre-quantized version with a set of benchmarks including Mobilenet and Inception-v3. As for accuracy issue, the degradation is tiny for quantization version among 500 images. All experiments are running on RISC-V simulator, Spike with P extension support.

[1] Heng Lin,et al. Devise Rust Compiler Optimizations on RISC-V Architectures with SIMD Instructions , 2019, ICPP Workshops.

[2] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[4] Tianqi Chen,et al. Relay: a new IR for machine learning frameworks , 2018, MAPL@PLDI.

[5] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[7] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8] Jason Cong,et al. Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[9] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Masahiro Masuda,et al. Efficient Execution of Quantized Deep Learning Models: A Compiler Approach , 2020, ArXiv.

[11] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.