Automatic Joint Optimization of Algorithm-Level Compression and Compiler-Based Acceleration with Reinforcement Learning for DNN in Edge Devices

More accurate machine learning models often require more memory cost and more software-hardware co-adaption efforts for deployments on resource-constrained devices. Model compression techniques and deep learning compiler are developed to reduce the memory cost and latency. However, current methods require tremendous engineering efforts to optimize the model manually. This paper introduces a jointly learning based framework to perform the compression task and the acceleration task simultaneously. The joint optimization method auto-tunes the algorithm-level compression and compiler-based acceleration with reinforcement learning. The experiment results demonstrate that we compress the model by a factor of 2 or 8, and accelerate the optimization up to 30 times using our learning framework.