Optimal Branch Location for Cost-effective Inference on Branchynet

Deep Neural Networks (DNNs) are very popular in many machine learning domains. To achieve higher accuracy, DNNs have become deeper and larger. However, the improvement in accuracy comes with the price of the longer inference time and energy consumption. The marginal cost to increase a unit of accuracy has become higher as the accuracy itself is rising.The Branchynet, known as early exits, is an architecture to address increasing marginal cost for improving accuracy. The Branchynet adds extra side classifiers to a DNN model. The inference on a significant portion of the samples can exit from the network earlier via these side branches if they already have high confidence in the results.The Branchynet requires manually tuning the learning hyperparameters, e.g., the locations of branches and the confidence threshold for early exiting. The effectiveness of this manual tuning dramatically impacts the efficiency of the tuned networks. To the best of our knowledge, there are no efficient algorithms to find the best branch location, which is a trade-off between the accuracy and inference time on the Branchynet.We propose an algorithm to find the optimal branch locations for the Branchynet. We formulate the problem of finding the optimal branch location for the branchynet as an optimization problem, and prove that the branch placement problem is an NPcomplete problem. We then derive dynamic programming that runs in pseudo-polynomial time and solves the branch placement problem optimally.We also implement our algorithm and solve the branch placement problems on four types of VGG networks. The experiment results indicate that our dynamic programming can find the optimal branch locations for generating the maximum number of correct classifications within a given time budget. We also run the four VGG models on a GeForce RTX-3090 GPU with the branch combination found by the dynamic programming. The experiment results show that our dynamic programming accurately predicts the number of correct classifications and the execution time on the GPU.