Multi-Branch Networks for Video Super-Resolution With Dynamic Reconstruction Strategy

Recently, the rapid development of 2-dimensional (2D) convolutional neural networks (CNNs) has driven single image super-resolution (SISR) into a new era, owing to their powerful ability in modeling spatial relation within one single image. However, few studies focus on video super-resolution (VSR) due to the key challenge that apart from the spatial relation, the temporal dependence among consecutive low-resolution (LR) frames must be taken into consideration for better reconstruction. In this article, unlike most previous methods based on optical flow for motion compensation, 3-dimensional (3D) convolution is utilized to capture the temporal relation. Firstly, in contrast to the conventional 3D convolution which is notorious for the excessively high computational burden, we propose an efficient 3D convolutional block (E3DB) through convolution factorization principle (CFP), which significantly reduces the computing load while maximally maintaining the temporal information. Then, by taking advantage of E3DB, we propose a novel multi-resolution extraction block (MREB) which aggregates the information from multiple resolutions, leading to the stronger high-resolution representation learning and better feature extraction. Besides, based on our 3-branch architecture, instead of the simple addition or concatenation, a dynamic reconstruction strategy (DRS) is proposed to adaptively fuse the optimal information of temporal dependence from each branch. It is therefore termed as dynamic multiple branch network (DMBN). Comprehensive experiments on public benchmark datasets demonstrate the superiority of our DMBN over the current state-of-the-art methods in terms of accuracy and efficiency.