End-to-End Learning for Omnidirectional Stereo Matching With Uncertainty Prior

In this paper, we propose a novel end-to-end deep neural network model for omnidirectional depth estimation from a wide-baseline multi-view stereo setup. The images captured with ultra-wide field-of-view cameras on an omnidirectional rig are processed by the feature extraction module, and then the deep feature maps are warped onto the concentric spheres swept through all candidate depths using the calibrated camera parameters. The 3D encoder-decoder block takes the aligned feature volume to produce an omnidirectional depth estimate with regularization on uncertain regions utilizing the global context information. For more accurate depth estimation we also propose an uncertainty prior guidance in two ways: depth map filtering and guiding regularization. In addition, we present large-scale synthetic datasets for training and testing omnidirectional multi-view stereo algorithms. Our datasets consist of 13K ground-truth depth maps and 53K fisheye images in four orthogonal directions with various objects and environments. Experimental results show that the proposed method generates excellent results in both synthetic and real-world environments, and it outperforms the prior art and the omnidirectional versions of the state-of-the-art conventional stereo algorithms.