EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes

Multimodal large-scale datasets for outdoor scenes are mostly designed for urban driving problems. The scenes are highly structured and semantically different from scenarios seen in nature-centered scenes such as gardens or parks. To promote machine learning methods for nature-oriented applications, such as agriculture and gardening, we propose the multimodal synthetic dataset for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images captured from more than 100 garden models. Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow. Experimental results on the state-of-the-art methods for semantic segmentation and monocular depth prediction, two important tasks in computer vision, show positive impact of pre-training deep networks on our dataset for unstructured natural scenes. The dataset and related materials will be available at this https URL.

[1]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[2]  Arnold W. M. Smeulders,et al.  Estimating small differences in car-pose from orbits , 2018, BMVC.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[7]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[8]  Wolfram Burgard,et al.  Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion , 2016, ISER.

[9]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew J. Chosak,et al.  OVVV: Using Virtual Worlds to Design and Evaluate Surveillance Systems , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Robert B. Fisher,et al.  The Second Workshop on 3D Reconstruction Meets Semantics: Challenge Results Discussion , 2018, ECCV Workshops.

[14]  Theo Gevers,et al.  Joint Learning of Intrinsic Images and Semantic Segmentation , 2018, ECCV.

[15]  Ken Perlin,et al.  An image synthesizer , 1988 .

[16]  Philipp Krähenbühl,et al.  Free Supervision from Video Games , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[21]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[22]  Theo Gevers,et al.  ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition , 2019, ArXiv.

[23]  Theo Gevers,et al.  Object features and face detection performance: Analyses with 3D-rendered synthetic data , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[24]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Antonio Torralba,et al.  Evaluation of image features using a photorealistic virtual world , 2011, 2011 International Conference on Computer Vision.

[26]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[30]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[31]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Chunhua Shen,et al.  Enforcing Geometric Constraints of Virtual Normal for Depth Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Sergey Tulyakov,et al.  Transformable Bottleneck Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[38]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[39]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[40]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[42]  Jake K. Aggarwal,et al.  On the computation of motion from sequences of images-A review , 1988, Proc. IEEE.

[43]  Jason Weber,et al.  Creation and rendering of realistic trees , 1995, SIGGRAPH.

[44]  Yubin Kuang,et al.  Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Theo Gevers,et al.  Three for one and one for three: Flow, Segmentation, and Surface Normals , 2018, BMVC.

[46]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[47]  Edward H. Adelson,et al.  Ground truth dataset and baseline evaluations for intrinsic image algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[49]  Jochen Hemming,et al.  Data synthesis methods for semantic segmentation in agriculture: A Capsicum annuum dataset , 2018, Comput. Electron. Agric..

[50]  Balazs Kovacs,et al.  Shading Annotations in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Sebastian Ramos,et al.  The Cityscapes Dataset , 2015, CVPR 2015.

[52]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.