Model-driven Simulations for Deep Convolutional Neural Networks

The use of simulated virtual environments to train deep convolutional neural networks (CNN) is a currently active practice to reduce the (real)data-hungriness of the deep CNN models, especially in application domains in which large scale real data and/or groundtruth acquisition is difficult or laborious. Recent approaches have attempted to harness the capabilities of existing video games, animated movies to provide training data with high precision groundtruth. However, a stumbling block is in how one can certify generalization of the learned models and their usefulness in real world data sets. This opens up fundamental questions such as: What is the role of photorealism of graphics simulations in training CNN models? Are the trained models valid in reality? What are possible ways to reduce the performance bias? In this work, we begin to address theses issues systematically in the context of urban semantic understanding with CNNs. Towards this end, we (a) propose a simple probabilistic urban scene model, (b) develop a parametric rendering tool to synthesize the data with groundtruth, followed by (c) a systematic exploration of the impact of level-of-realism on the generality of the trained CNN model to real world; and domain adaptation concepts to minimize the performance bias.

[1]  Florent Lafarge,et al.  Geometric Feature Extraction by a Multimarked Point Process , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  D. Stoyan,et al.  Fractals, random shapes and point fields : methods of geometrical statistics , 1996 .

[3]  P. Diggle,et al.  Spatial point pattern analysis and its application in geographical epidemiology , 1996 .

[4]  Takeo Kanade,et al.  Learning scene-specific pedestrian detectors without real data , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Greg Humphreys,et al.  Physically Based Rendering: From Theory to Implementation , 2004 .

[6]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[7]  T. Vaudrey,et al.  Differences between stereo and motion behaviour on synthetic and real-world stereo sequences , 2008, 2008 23rd International Conference Image and Vision Computing New Zealand.

[8]  Jiaolong Xu,et al.  Domain Adaptation of Deformable Part-Based Models , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Slobodan Ilic,et al.  Framework for Generation of Synthetic Ground Truth Data for Driver Assistance Applications , 2013, GCPR.

[10]  Antonio M. López,et al.  Virtual and Real World Adaptation for Pedestrian Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  S. Meister,et al.  Real versus realistically rendered scenes for optical flow evaluation , 2011, 2011 14th ITG Conference on Electronic Media Technology.

[13]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[14]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[17]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Visvanathan Ramesh,et al.  Simulations for Validation of Vision Systems , 2015, ArXiv.

[19]  Visvanathan Ramesh,et al.  Model Validation for Vision Systems via Graphics Simulation , 2015, ArXiv.

[20]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).