Experiments and optimizations for TVM on RISC-V Architectures with P Extension

TVM is a AI compiler supports both graph-level and operator-level optimization on machine learning models. It provides an optimizable flow to deploy on diverse target devices. By exploiting TVM schedule, we can optimize the codegen behavior for our RISC-V architecture. Since RISCV is configurable with the selection on different extension, we can enable multiple extensions depends on the application scene. In our work, we present the flow for enabling and optimizing the RISC-V P extension toward QNN models from TVM. With support from LLVM and a customized deep learning runtime (DLR), we verified our work on both FLOAT32 and prequantized models from Tensorflow Lite. Experiments shows that comparing with FLOAT32 models, our work can achieve 2.7-7.0 times of performance improvement with regard to total instruction count at runtime for pre-quantized version with a set of benchmarks including Mobilenet and Inception-v3. As for accuracy issue, the degradation is tiny for quantization version among 500 images. All experiments are running on RISC-V simulator, Spike with P extension support.