Deep Learning-Based Incorporation of Planar Constraints for Robust Stereo Depth Estimation in Autonomous Vehicle Applications

In autonomous vehicles, depth information for the environment surrounding the vehicle is commonly extracted using time-of-flight (ToF) sensors such as LiDARs and RADARs. Those sensors have some limitations that may potentially degrade the quality and utility of the depth information to a substantial extent. An alternative solution is depth estimation from stereo pairs. However, stereo matching and depth estimation often fails at ill-posed regions including areas with repetitive patterns or textureless surfaces which are commonly found on planar surfaces. This paper focuses on designing an efficient framework for stereo depth estimation, using deep learning technique, that is robust against the mentioned ill-posed regions. With the observation that disparities of all pixels belonging to planar areas (scene plane) viewed by two rectified stereo images can be described using affine transformations, our proposed method predicts pixel-wise affine transformation parameters based on the depth information encoded in the aggregated cost volume. We also introduce a propagation term which enforces all pixels belonging to the same scene plane to be transformed using the same parameters. Disparity can then be computed by multiplying the predicted affine parameters with the corresponding pixel locations. The proposed method was evaluated on several benchmark datasets. We are able to obtain competitive results and at the same time reducing the processing time of common convolution neural network (CNN) in stereo matching by 50%. Analysis of the findings shows that our method can produce reliable results at the ill-posed regions which are challenging to the current state-of-the-arts methods.