Improved Early Exiting Activation to Accelerate Edge Inference

As mobile & edge devices are getting powerful, on-device deep learning is becoming a reality. However, there are still many challenges for deep learning edge inferences, such as limited resources such as computing power, memory space, and energy. To address these challenges, model compression such as channel pruning, low rank representation, network quantization, and early exiting has been introduce to reduce the computational load of neural networks at a whole. In this paper, we propose an improved method of implementing early exiting branches on a pre-defined neural network, so that it can determine whether the input data is easy to process, therefore use less resource to execute the task. Our method starts with an entire search for activations in a given network, then inserting early exiting modules, testing those early exit branches, resulting in selecting useful branches that are both accurate and fast. Our contribution is reducing the computing time of neural networks by breaking the flow of models using execution branches. Additionally, by testing on all activations in neural network, we gain knowledge of the neural network model and insight on where to place the ideal early exit auxiliary classifier. We test on ResNet model and show reduction in real computation time on single input images.