UnrealStereo: Controlling Hazardous Factors to Analyze Stereo Vision

A reliable stereo algorithm is critical for many robotics applications. But textureless and specular regions can easily cause failure by making feature matching difficult. Understanding whether an algorithm is robust to these hazardous regions is important. Although many stereo benchmarks have been developed to evaluate performance, it is hard to quantify the effect of hazardous regions in real images because the location and severity of these regions are unknown. In this paper, we develop a synthetic image generation tool enabling to control hazardous factors, such as making objects more specular or transparent, to produce hazardous regions at different degrees. The densely controlled sampling strategy in virtual worlds enables to effectively stress test stereo algorithms by varying the types and degrees of the hazard. We generate a large synthetic image dataset with automatically computed hazardous regions and analyze algorithms on these regions. The observations from synthetic images are further validated by annotating hazardous regions in real-world datasets Middlebury and KITTI (which gives a sparse sampling of the hazards). Our synthetic image generation tool is based on a game engine Unreal Engine 4 and will be open-source along with the virtual scenes in our experiments. Many publicly available realistic game contents can be used by our tool to provide an enormous resource for development and evaluation of algorithms.

[1]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[3]  Alan L. Yuille,et al.  UnrealCV: Connecting Computer Vision to Unreal Engine , 2016, ECCV Workshops.

[4]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[5]  ARNO KNAPITSCH,et al.  Tanks and temples , 2017, ACM Trans. Graph..

[6]  Reinhard Klette,et al.  Robustness evaluation of stereo algorithms on long stereo sequences , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[7]  Enhua Wu,et al.  Constant Time Weighted Median Filtering for Stereo Matching and Beyond , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Ali Borji,et al.  iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Alan L. Yuille,et al.  Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[11]  Oliver Zendel,et al.  Analyzing Computer Vision Data — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Daniel Kondermann,et al.  Synthesizing Real World Stereo Challenges , 2013, GCPR.

[13]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Magnus Wrenninge,et al.  Procedural Modeling and Physically Based Rendering for Synthetic Data Generation in Automotive Applications , 2017, ArXiv.

[16]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[17]  Yee-Hong Yang,et al.  Evaluation of constructable match cost measures for stereo correspondence using cluster ranking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[20]  Atsuto Maki,et al.  Towards a simulation driven stereo vision system , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[21]  Richard Szeliski,et al.  An Experimental Comparison of Stereo Algorithms , 1999, Workshop on Vision Algorithms.

[22]  Bernd Jähne,et al.  Outdoor stereo camera system for the generation of real-world benchmark data sets , 2012 .

[23]  Lena Maier-Hein,et al.  The HCI Stereo Metrics: Geometry-Aware Performance Analysis of Stereo Algorithms , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[25]  Radim Sára,et al.  Dense Stereomatching Algorithm Performance for View Prediction and Structure Reconstruction , 2003, SCIA.

[26]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[27]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Johannes L. Schönberger,et al.  Supplementary Material for A MultiView Stereo Benchmark with High-Resolution Images and Multi-Camera Videos , 2017 .

[29]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[31]  Qingxiong Yang,et al.  Near Real-time Stereo for Weakly-Textured Scenes , 2008, BMVC.

[32]  Oliver Zendel,et al.  CV-HAZOP: Introducing Test Data Validation for Computer Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Brian Karis,et al.  Real Shading in Unreal Engine 4 by , 2013 .

[34]  Varun Jampani,et al.  Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[37]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Heiko Hirschmüller,et al.  Evaluation of Cost Functions for Stereo Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Ying Xiong,et al.  Low-level vision by consensus in a spatial hierarchy of regions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41]  Andrew W. Fitzgibbon,et al.  Reflection Modeling for Passive Stereo , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).