RTMV: A Ray-Traced Multi-View Synthetic Dataset for Novel View Synthesis

We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures. Because our dataset is too large for existing methods to process, we propose Sparse Voxel Light Field (SVLF), an efficient voxel-based light field approach for novel view synthesis that achieves comparable performance to NeRF on synthetic data, while being an order of magnitude faster to train and two orders of magnitude faster to render. SVLF achieves this speed by relying on a sparse voxel octree, careful voxel sampling (requiring only a handful of queries per ray), and reduced network structure; as well as ground truth depth maps at training time. Our dataset is generated by NViSII, a Python-based ray tracing renderer, which is designed to be simple for non-experts to use and share, flexible and powerful through its use of scripting, and able to create high-quality and physically-based rendered images. Experiments with a subset of our dataset allow us to compare standard methods like NeRF and mip-NeRF for single-scene modeling, and pixelNeRF for category-level modeling, pointing toward the need for future improvements in this area.

[1]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[2]  Ken Goldberg,et al.  Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects , 2021, CoRL.

[3]  Leon L. Xu,et al.  ABO: Dataset and Benchmarks for Real-World 3D Object Understanding , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Thomas J. Cashman,et al.  Fake it till you make it: face analysis in the wild using synthetic data alone , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Patrick Labatut,et al.  Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  J.-Y. Zhu,et al.  Advances in Neural Rendering , 2021, SIGGRAPH Courses.

[7]  C. Theobalt,et al.  NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, NeurIPS.

[8]  Stephen Tyree,et al.  NViSII: A Scriptable Tool for Photorealistic Image Generation , 2021, ArXiv.

[9]  Dong Li,et al.  ROBI: A Multi-View Dataset for Reflective Objects in Robotic Bin-Picking , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Andreas Geiger,et al.  UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Tobias Ritschel,et al.  Unsupervised Learning of 3D Object Categories from Videos in the Wild , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hao Su,et al.  MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Pratul P. Srinivasan,et al.  Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  C. R. A. Chaitanya,et al.  DONeRF: Towards Real‐Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks , 2021, Comput. Graph. Forum.

[16]  Pratul P. Srinivasan,et al.  IBRNet: Learning Multi-View Image-Based Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Charles T. Loop,et al.  Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jiajun Wu,et al.  Neural Radiance Flow for 4D View Synthesis and Video Processing , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Frank Dellaert,et al.  Neural Volume Rendering: NeRF And Beyond , 2020, ArXiv.

[20]  Chia-Kai Liang,et al.  Portrait Neural Radiance Fields from a Single Image , 2020, ArXiv.

[21]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Angjoo Kanazawa,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Pratul P. Srinivasan,et al.  Learned Initializations for Optimizing Coordinate-Based Neural Representations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jiajun Wu,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zhengqi Li,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Wei Jiang,et al.  DeRF: Decomposed Radiance Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Changil Kim,et al.  Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Kai Zhang,et al.  NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[31]  Alex Trevithick,et al.  GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering , 2020, ArXiv.

[32]  Mathieu Aubry,et al.  CosyPose: Consistent multi-view multi-object 6D pose estimation , 2020, ECCV.

[33]  Gernot Riegler,et al.  Free View Synthesis , 2020, ECCV.

[34]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Zhengqi Li,et al.  Crowdsampling the Plenoptic Function , 2020, ECCV.

[36]  Gil Elbaz,et al.  EXPO-HD: Exact Object Perception using High Distraction Synthetic Data , 2020, ArXiv.

[37]  Kyaw Zaw Lin,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[38]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[39]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[40]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  N. Mitra,et al.  Learning a Neural 3D Texture Space From 2D Exemplars , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Long Quan,et al.  BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Sanja Fidler,et al.  Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research , 2019, ArXiv.

[44]  Shuran Song,et al.  Clear Grasp: 3D Shape Estimation of Transparent Objects for Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Christoph Heindl,et al.  Learning to Predict Robot Keypoints Using Artificially Generated Images , 2019, 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA).

[46]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[47]  Paul Debevec,et al.  DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Ravi Ramamoorthi,et al.  Local light field fusion , 2019, ACM Trans. Graph..

[49]  Jonathan T. Barron,et al.  Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Marc Alexa,et al.  ABC: A Big CAD Model Dataset for Geometric Deep Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jan Kautz,et al.  Extreme View Synthesis , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[56]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[57]  Johannes Hanika,et al.  Monte Carlo Methods for Volumetric Light Transport Simulation , 2018, Comput. Graph. Forum.

[58]  Varun Jampani,et al.  Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[59]  Stanley T. Birchfield,et al.  Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[62]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[63]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[64]  Jan Kautz,et al.  Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Jitendra Malik,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Xiaolin Hu,et al.  UnrealStereo: A Synthetic Dataset for Analyzing Stereo Vision , 2016, ArXiv.

[67]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[68]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Roberto Cipolla,et al.  SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[72]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[73]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[74]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[75]  Henrik Aanæs,et al.  Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Bernardo Iraci,et al.  Blender Cycles: Lighting and Rendering Cookbook , 2013 .

[77]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[78]  George Drettakis,et al.  Silhouette‐Aware Warping for Image‐Based Rendering , 2011, Comput. Graph. Forum.

[79]  David K. McAllister,et al.  OptiX: a general purpose ray tracing engine , 2010, ACM Trans. Graph..

[80]  Richard Szeliski,et al.  Piecewise planar stereo for image-based rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[81]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[82]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[83]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[84]  Donald D. Hoffman,et al.  Unity of perception , 1991, Cognition.

[85]  NeRF-GTO: Using a Neural Radiance Field to Grasp Transparent Objects , 2021 .

[86]  P. Shirley,et al.  A Ray-Box Intersection Algorithm and Efficient Dynamic Voxel Rendering , 2018 .

[87]  Abhishek Kar,et al.  Learning a MultiView Stereo Machine , 2017 .

[88]  Extending the Disney BRDF to a BSDF with Integrated Subsurface Scattering , 2015 .