BayesTuner: Leveraging Bayesian Optimization For DNN Inference Configuration Selection

Deep learning sits at the core of many applications and products deployed on large-scale infrastructures such as data centers. Since the power consumption of data centers contributes significantly to operational costs and carbon footprint, it is essential to improve their power efficiency. To this end, both the hardware platform and application should be configured properly. However, identifying the best configuration automatically for a wide range of available options with affordable search cost is challenging (e.g., DNN batch size, number of cores, and amount of memory allocated to the application). Employing an exhaustive approach to test all the possible configurations is unfeasible. To tackle this challenge, we introduce BayesTuner that employs Bayesian Optimization to estimate the performance model of deep neural network inference applications under different configurations with a few test runs. Having these models, BayesTuner is able to differentiate the optimal or near-optimal configurations from the rest of options. Using a realistic setup with various DNNs, we show how efficiently BayesTuner can explore the huge state space of possible control configurations, and minimize the power consumption of the system, while meeting the throughput constraint of different DNNs.