UASOL, a large-scale high-resolution outdoor stereo dataset

In this paper, we propose a new dataset for outdoor depth estimation from single and stereo RGB images. The dataset was acquired from the point of view of a pedestrian. Currently, the most novel approaches take advantage of deep learning-based techniques, which have proven to outperform traditional state-of-the-art computer vision methods. Nonetheless, these methods require large amounts of reliable ground-truth data. Despite there already existing several datasets that could be used for depth estimation, almost none of them are outdoor-oriented from an egocentric point of view. Our dataset introduces a large number of high-definition pairs of color frames and corresponding depth maps from a human perspective. In addition, the proposed dataset also features human interaction and great variability of data, as shown in this work.Design Type(s)modeling and simulation objective • image creation and editing objective • database creation objectiveMeasurement Type(s)imageTechnology Type(s)digital cameraFactor Type(s)geographic locationSample Characteristic(s)Alicante • anthropogenic habitatMachine-accessible metadata file describing the reported data (ISA-Tab format)

[1]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[2]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Ben Glocker,et al.  Real-time RGB-D camera relocalization , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[4]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[5]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[6]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Esteban Walter Gonzalez Clua,et al.  A Versatile Method for Depth Data Error Estimation in RGB-D Sensors , 2018, Sensors.

[8]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  ARNO KNAPITSCH,et al.  Tanks and temples , 2017, ACM Trans. Graph..

[11]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[13]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[14]  Tanishq Gupta,et al.  Indoor mapping for smart cities — An affordable approach: Using Kinect Sensor and ZED stereo camera , 2017, 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[15]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[16]  H. Hirschmüller Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Stereo Processing by Semi-global Matching and Mutual Information , 2022 .

[17]  Alberto Garcia-Garcia,et al.  UnrealROX: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation , 2018, Virtual Reality.

[18]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[19]  Roland Siegwart,et al.  Conference Presentation Slides on Kinect v2 for Mobile Robot Navigation: Evaluation and Modeling , 2015 .

[20]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Miguel Cazorla,et al.  ViDRILO: The Visual and Depth Robot Indoor Localization with Objects information dataset , 2015, Int. J. Robotics Res..

[22]  Duc Thanh Nguyen,et al.  SceneNN: A Scene Meshes Dataset with aNNotations , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[23]  Koshy Varghese,et al.  Comparison of Handheld Devices for 3D Reconstruction in Construction , 2017 .

[24]  Jitendra Malik,et al.  The Berkeley 3D Object Dataset , 2012 .

[25]  Stefan Leutenegger,et al.  SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth , 2016, ArXiv.

[26]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..