Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives

In the study of image and vision computing, the generalization capability of an algorithm often determines whether it is able to work well in complex scenes. The goal of this review article is to survey the use of photorealistic image synthesis methods in addressing the problems of visual perception and understanding. Currently, the ACP Methodology comprising artificial systems, computational experiments, and parallel execution is playing an essential role in modeling and control of complex systems. This paper extends the ACP Methodology into the computer vision field, by proposing the concept and basic framework of Parallel Vision. In this paper, we first review previous works related to Parallel Vision, in terms of synthetic data generation and utilization. We detail the utility of synthetic data for feature analysis, object analysis, scene analysis, and other analyses. Then we propose the basic framework of Parallel Vision, which is composed of an ACP trilogy (artificial scenes, computational experiments, and parallel execution). We also present some in-depth thoughts and perspectives on Parallel Vision. This paper emphasizes the significance of synthetic data to vision system design and suggests a novel research methodology for perception and understanding of complex scenes.

[1]  Kate Saenko,et al.  Generating Large Scale Image Datasets from 3 D CAD Models , 2015 .

[2]  J. Angel Arul Jothi,et al.  A survey on automated cancer diagnosis from histopathology images , 2017, Artificial Intelligence Review.

[3]  Fei-Yue Wang,et al.  The fourth type of covering-based rough sets , 2012 .

[4]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[5]  Tardi Tjahjadi,et al.  Robust arbitrary view gait recognition based on parametric 3D human body reconstruction and virtual posture synthesis , 2016, Pattern Recognit..

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Miguel A. Ferrer,et al.  Static Signature Synthesis: A Neuromotor Inspired Approach for Biometrics , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Kai Chen,et al.  A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine , 2016, ArXiv.

[9]  Zhenhua Wang,et al.  Synthesizing Training Images for Boosting Human 3D Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[10]  James J. Little,et al.  Play and Learn: Using Video Games to Train Computer Vision Models , 2016, BMVC.

[11]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yanjie Yao,et al.  Video-Based Vehicle Detection Approach with Data-Driven Adaptive Neuro-Fuzzy Networks , 2015, Int. J. Pattern Recognit. Artif. Intell..

[13]  W. Bainbridge The Scientific Research Potential of Virtual Worlds , 2007, Science.

[14]  David Vázquez,et al.  Learning appearance in virtual scenarios for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[16]  Vincent Lepetit,et al.  On rendering synthetic images for training an object detector , 2014, Comput. Vis. Image Underst..

[17]  Rama Chellappa,et al.  Domain Adaptation for Visual Recognition , 2015, Found. Trends Comput. Graph. Vis..

[18]  D. Bermudez Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection , 2013 .

[19]  Xiaogang Wang,et al.  LCrowdV: Generating Labeled Videos for Simulation-Based Crowd Behavior Learning , 2016, ECCV Workshops.

[20]  Javier Ruiz-del-Solar,et al.  A realistic virtual environment for evaluating face analysis systems under dynamic conditions , 2016, Pattern Recognit..

[21]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Demetri Terzopoulos,et al.  Smart Camera Networks in Virtual Reality , 2007, Proceedings of the IEEE.

[23]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[24]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[25]  Yanjie Yao,et al.  Vehicle License Plate Recognition Based on Extremal Regions and Restricted Boltzmann Machines , 2016, IEEE Transactions on Intelligent Transportation Systems.

[26]  Pau Baiget,et al.  Augmenting video surveillance footage with virtual agents for incremental event evaluation , 2011, Pattern Recognit. Lett..

[27]  Zahid Halim,et al.  Artificial intelligence techniques for driving safety and vehicle crash prediction , 2016, Artificial Intelligence Review.

[28]  Nicolas Courty,et al.  Ground Truth for Pedestrian Analysis and Application to Camera Calibration , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Visvanathan Ramesh,et al.  Simulations for Validation of Vision Systems , 2015, ArXiv.

[30]  Tim J. Ellis,et al.  ViHASi: Virtual human action silhouette data for the performance evaluation of silhouette-based action recognition methods , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[31]  Antonio M. López,et al.  Virtual and Real World Adaptation for Pedestrian Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Meng Wang,et al.  Scene-Specific Pedestrian Detection for Static Video Surveillance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Liuqing Yang,et al.  Where does AlphaGo go: from church-turing thesis to AlphaGo thesis and beyond , 2016, IEEE/CAA Journal of Automatica Sinica.

[34]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Meng Wang,et al.  Deep Learning of Scene-Specific Classifier for Pedestrian Detection , 2014, ECCV.

[37]  Dayong Shen,et al.  Visual Tracking Based on Dynamic Coupled Conditional Random Field Model , 2016, IEEE Transactions on Intelligent Transportation Systems.

[38]  Kate Saenko,et al.  Learning Deep Object Detectors from 3D Models , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Kunfeng Wang,et al.  Measuring Driving Behaviors from Live Video , 2012, IEEE Intelligent Systems.

[40]  Nicolas Courty,et al.  Using the Agoraset dataset: Assessing for the quality of crowd video analysis methods , 2014, Pattern Recognit. Lett..

[41]  Jie Zhang,et al.  PDP: parallel dynamic programming , 2017, IEEE CAA J. Autom. Sinica.

[42]  Antoine Vacavant,et al.  A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos , 2014, Comput. Vis. Image Underst..

[43]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44]  Wang Fei,et al.  Parallel Control: A Method for Data-Driven and Computational Control , 2013 .

[45]  Mark H. Overmars,et al.  Simulating and Evaluating the Local Behavior of Small Pedestrian Groups , 2012, IEEE Transactions on Visualization and Computer Graphics.

[46]  Faisal Z. Qureshi,et al.  Software Laboratory for Camera Networks Research , 2013, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[47]  Andrew J. Chosak,et al.  OVVV: Using Virtual Worlds to Design and Evaluate Surveillance Systems , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Yohan Dupuis,et al.  A Survey of Vision-Based Traffic Monitoring of Road Intersections , 2016, IEEE Transactions on Intelligent Transportation Systems.

[49]  Takeo Kanade,et al.  How Useful Is Photo-Realistic Rendering for Visual Learning? , 2016, ECCV Workshops.

[50]  Nicola Jones,et al.  Computer science: The learning machines , 2014, Nature.

[51]  Jiaolong Xu,et al.  Learning a Part-Based Pedestrian Detector in a Virtual World , 2014, IEEE Transactions on Intelligent Transportation Systems.

[52]  Wang Feiyue,et al.  Parallel system methods for management and control of complex systems , 2004 .

[53]  Fei-Yue Wang,et al.  A Multi-view Learning Approach to Foreground Detection for Traffic Surveillance Applications , 2016, IEEE Transactions on Vehicular Technology.

[54]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[55]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[56]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[57]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Massimo Bertozzi,et al.  GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection , 1998, IEEE Trans. Image Process..

[59]  Lior Shamir,et al.  Comparison of Data Set Bias in Object Recognition Benchmarks , 2015, IEEE Access.

[60]  Hugo Proença,et al.  Biometric recognition in surveillance scenarios: a survey , 2016, Artificial Intelligence Review.

[61]  Liuqing Yang,et al.  Driving into Intelligent Spaces with Pervasive Communications , 2007, IEEE Intelligent Systems.

[62]  Fei-Yue Wang,et al.  Parallel Control and Management for Intelligent Transportation Systems: Concepts, Architectures, and Applications , 2010, IEEE Transactions on Intelligent Transportation Systems.

[63]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[64]  Antonio Torralba,et al.  Evaluation of image features using a photorealistic virtual world , 2011, 2011 International Conference on Computer Vision.

[65]  C. Lawrence Zitnick,et al.  Adopting Abstract Images for Semantic Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Roberto Cipolla,et al.  SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[67]  Mohsen Ramezani,et al.  A review on human action analysis in videos for retrieval applications , 2016, Artificial Intelligence Review.

[68]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[70]  Fatih Murat Porikli,et al.  A Novel Video Dataset for Change Detection Benchmarking , 2014, IEEE Transactions on Image Processing.

[71]  Bernt Schiele,et al.  What Is Holding Back Convnets for Detection? , 2015, GCPR.

[72]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Fenghua Zhu,et al.  DynaCAS: Computational Experiments and Decision Support for ITS , 2008, IEEE Intelligent Systems.

[74]  Raffaele Cappelli Fingerprint Sample Synthesis , 2009, Encyclopedia of Biometrics.

[75]  Visvanathan Ramesh,et al.  Model Validation for Vision Systems via Graphics Simulation , 2015, ArXiv.

[76]  Mathieu Aubry,et al.  Understanding Deep Features with Computer-Generated Imagery , 2015, ICCV.

[77]  Thierry Chateau,et al.  A Benchmark Dataset for Outdoor Foreground/Background Extraction , 2012, ACCV Workshops.

[78]  Neil A. Thacker,et al.  Performance characterization in computer vision: A guide to best practices , 2008, Comput. Vis. Image Underst..

[79]  Anil A. Bharath,et al.  A data augmentation methodology for training machine/deep learning gait recognition algorithms , 2016, BMVC.

[80]  Fei-Yue Wang Parallel Control: A Method for Data-Driven and Computational Control: Parallel Control: A Method for Data-Driven and Computational Control , 2014 .

[81]  Slobodan Ilic,et al.  Framework for Generation of Synthetic Ground Truth Data for Driver Assistance Applications , 2013, GCPR.

[82]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[83]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Jiaolong Xu,et al.  Domain Adaptation of Deformable Part-Based Models , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  Kate Saenko,et al.  From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains , 2014, BMVC.

[87]  Roberto Cipolla,et al.  Understanding RealWorld Indoor Scenes with Synthetic Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Muhammad Ghifary,et al.  Domain Adaptation and Domain Generalization with Representation Learning , 2016 .

[89]  Takeo Kanade,et al.  Learning scene-specific pedestrian detectors without real data , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Li Li,et al.  Steps toward Parallel Intelligence , 2016 .

[91]  Wang Fei-Yue,et al.  Parallel Control: A Method for Data-Driven and Computational Control , 2013 .

[92]  Rafael Bidarra,et al.  A Survey on Procedural Modeling for Virtual Worlds , 2014 .

[93]  Omid Aghazadeh,et al.  Human Pose Estimation from RGB Input Using Synthetic Training Data , 2014, ArXiv.

[94]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[95]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Benjamin Höferlin,et al.  Evaluation of background subtraction techniques for video surveillance , 2011, CVPR 2011.

[97]  Alan L. Yuille,et al.  UnrealCV: Connecting Computer Vision to Unreal Engine , 2016, ECCV Workshops.

[98]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[99]  Natalia A. Schmid,et al.  On Generation and Analysis of Synthetic Iris Images , 2007, IEEE Transactions on Information Forensics and Security.

[100]  Julian Fiérrez,et al.  Synthetic on-line signature generation. Part II: Experimental validation , 2012, Pattern Recognit..

[101]  Nicolas Courty,et al.  AGORASET: a dataset for crowd video analysis , 2012 .

[102]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Visvanathan Ramesh,et al.  Model-driven Simulations for Deep Convolutional Neural Networks , 2016, ArXiv.

[104]  Kai Ma,et al.  Enhancing Place Recognition Using Joint Intensity - Depth Analysis and Synthetic Data , 2016, ECCV Workshops.

[105]  Thomas Brox,et al.  Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[108]  Helmut Prendinger,et al.  Tokyo Virtual Living Lab: Designing Smart Cities Based on the 3D Internet , 2013, IEEE Internet Computing.

[109]  Réjean Plamondon,et al.  Synthetic on-line signature generation. Part I: Methodology and algorithms , 2012, Pattern Recognit..

[110]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[111]  Hongli Deng,et al.  Performance evaluation of an intelligent video surveillance system - A case study , 2010, Comput. Vis. Image Underst..

[112]  Rafael Bidarra,et al.  A Survey on Procedural Modelling for Virtual Worlds , 2014, Comput. Graph. Forum.

[113]  Mario Fritz,et al.  Image-Based Synthesis and Re-synthesis of Viewpoints Guided by 3D Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[114]  Michael J. Black,et al.  Lessons and Insights from Creating a Synthetic Optical Flow Benchmark , 2012, ECCV Workshops.

[115]  Andrea Vedaldi,et al.  ResearchDoom and CocoDoom: Learning Computer Vision with Games , 2016, ArXiv.