Unsupervised Learning of Stereo Vision with Monocular Depth Cues

We demonstrate unsupervised learning of a stereo vision model involving monocular depth cues (shape from texture cues). We formulate a conditional probability model defining the probability of the right image given the left. This conditional model does not model a probability distribution over images. Maximizing conditional liklihood rather than joint liklihood is similar using a CRF (Conditional Random Field, [6]) rather than an MRF (joint Markov Random Field). The most closely related earlier work seems to be that of Zhang and Seitz [8] who give a method for adapting five parameters of a stereo vision model. In contrast we train highly parameterized monocular depth cues. Also, we avoid the need for independence assumptions through the use of contrastive divergence training — a general method for optimizing CRFs [4]. There is also related work by Saxena et al. on supervised learning of highly parameterized monocular depth cues [1, 2]. Unlike Saxena et al. we train monocular depth cues as part of unsupervised training of a stereo algorithm. Other related work includes that of Scharstein and Pal [7] and Kong and Tao [5] who perform supervised training of stereo algorithms using general CRF methods. We focus on histogram of oriented gradient (HOG) features as a (texture) surface orientation cue. As a surface is tilted away from the camera the edges in the direction of the tilt become foreshortened while the edges orthogonal to the tilt are not. The effect on the edge distribution is shown in the image below where the average HOG feature is shown for regions of tree trunk and forest floor. The cylindrical shape of the tree trunk is clearly indicated by the warping of the HOG feature.

[1]  David A. McAllester,et al.  Particle Belief Propagation , 2009, AISTATS.

[2]  Andrew P. Witkin,et al.  Recovering Surface Shape and Orientation from Texture , 1981, Artif. Intell..

[3]  Christopher Joseph Pal,et al.  On Learning Conditional Random Fields for Stereo , 2007, International Journal of Computer Vision.

[4]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[5]  Li Zhang,et al.  Parameter estimation for MRF stereo , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[9]  Carlo Tomasi,et al.  Multiway cut for stereo and motion with slanted surfaces , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Li Zhang,et al.  Estimating Optimal Parameters for MRF Stereo from a Single Image Pair , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrea Fusiello,et al.  Quasi-Euclidean uncalibrated epipolar rectification , 2008, 2008 19th International Conference on Pattern Recognition.

[12]  Andrew Blake,et al.  Shape from Texture: Estimation, Isotropy and Moments , 1990, Artif. Intell..

[13]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[14]  Stan Z. Li,et al.  Markov Random Field Modeling in Computer Vision , 1995, Computer Science Workbench.

[15]  P. F. Felzenzwalb Efficiently computing a good segmentation , 1998 .

[16]  Andreas Klaus,et al.  Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Hai Tao,et al.  A method for learning matching errors for stereo computation , 2004, BMVC.

[19]  J. Aloimonos Shape from texture , 1988, Biological cybernetics.

[20]  Ashutosh Saxena,et al.  Depth Estimation Using Monocular and Stereo Cues , 2007, IJCAI.

[21]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[22]  D. Forsyth,et al.  Recovering shape and irradiance maps from rich dense texton fields , 2004, CVPR 2004.