Kubric: A scalable dataset generator

Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification.

[1]  Andrea Tagliasacchi,et al.  NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes , 2021, Trans. Mach. Learn. Res..

[2]  Mehdi S. M. Sajjadi,et al.  Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Gamaleldin F. Elsayed,et al.  Conditional Object-Centric Learning from Video , 2021, ICLR.

[4]  Leon L. Xu,et al.  ABO: Dataset and Benchmarks for Real-World 3D Object Understanding , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jitendra Malik,et al.  Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Thomas J. Cashman,et al.  Fake it till you make it: face analysis in the wild using synthetic data alone , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Sujoy Ganguly,et al.  Unity Perception: Generate Synthetic Data for Computer Vision , 2021, ArXiv.

[8]  Angel X. Chang,et al.  Habitat 2.0: Training Home Assistants to Rearrange their Habitat , 2021, NeurIPS.

[9]  Yaron Lipman,et al.  Volume Rendering of Neural Implicit Surfaces , 2021, NeurIPS.

[10]  Andrea Vedaldi,et al.  PASS: An ImageNet replacement for self-supervised pretraining without humans , 2021, NeurIPS Datasets and Benchmarks.

[11]  Matthew Botvinick,et al.  SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition , 2021, NeurIPS.

[12]  Vincent Sitzmann,et al.  Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering , 2021, NeurIPS.

[13]  Varun Jampani,et al.  LASR: Learning Articulated Shape Reconstruction from a Monocular Video , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sanja Fidler,et al.  DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiaolong Wang,et al.  Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Stefan Leutenegger,et al.  In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Angjoo Kanazawa,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[19]  Philip H. S. Torr,et al.  HOTA: A Higher Order Metric for Evaluating Multi-object Tracking , 2020, International Journal of Computer Vision.

[20]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Josh H. McDermott,et al.  ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation , 2020, NeurIPS Datasets and Benchmarks.

[22]  Vinay Uday Prabhu,et al.  Large image datasets: A pyrrhic win for computer vision? , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23]  Tao Li,et al.  Structure-Measure: A New Way to Evaluate Foreground Maps , 2017, International Journal of Computer Vision.

[24]  Jaegul Choo,et al.  AnimeCeleb: Large-Scale Animation CelebFaces Dataset via Controllable 3D Synthetic Models , 2021, ArXiv.

[25]  Christoph Bregler,et al.  SimPose: Effectively Learning DensePose and Surface Normals of People from Simulated Data , 2020, ECCV.

[26]  Abhinav Gupta,et al.  Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases , 2020, NeurIPS.

[27]  R. Ramamoorthi,et al.  OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Thomas Kipf,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[29]  Ling Shao,et al.  Camouflaged Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yichao Zhou,et al.  ManifoldPlus: A Robust and Scalable Watertight Manifold Surface Generation Method for Triangle Soups , 2020, ArXiv.

[31]  Martin Jägersand,et al.  U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection , 2020, Pattern Recognit..

[32]  Sven Behnke,et al.  Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[34]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[35]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[36]  Fei-Fei Li,et al.  Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy , 2019, FAT*.

[37]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Silvio Savarese,et al.  Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2019, IEEE Robotics and Automation Letters.

[39]  Ingmar Posner,et al.  GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations , 2019, ICLR.

[40]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[41]  Stefan Hinterstoißer,et al.  An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[42]  Ming-Ming Cheng,et al.  EGNet: Edge Guidance Network for Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Andrew Zisserman,et al.  Sim2real transfer learning for 3D pose estimation: motion to the rescue , 2019, ArXiv.

[44]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[45]  Simon Lucey,et al.  Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes , 2019, 2019 International Conference on 3D Vision (3DV).

[46]  Zhe Wu,et al.  Cascaded Partial Decoder for Fast and Accurate Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[48]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Klaus Greff,et al.  Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[52]  Matthew Botvinick,et al.  MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[53]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Marc Alexa,et al.  ABC: A Big CAD Model Dataset for Geometric Deep Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Luc Van Gool,et al.  Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[57]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Deva Ramanan,et al.  Volumetric Correspondence Networks for Optical Flow , 2019, NeurIPS.

[59]  James Traer,et al.  A PERCEPTUALLY INSPIRED GENERATIVE MODEL OF RIGID-BODY CONTACT SOUNDS , 2019 .

[60]  Peyman Moghadam,et al.  Deep Leaf Segmentation Using Synthetic Data , 2018, BMVC.

[61]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[62]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Martin Jägersand,et al.  ByLabel: A Boundary Based Semi-Automatic Image Annotation Tool , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[64]  Hayit Greenspan,et al.  Synthetic data augmentation using GAN for improved liver lesion classification , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[65]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[69]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[70]  Laurens van der Maaten,et al.  Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[71]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[72]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[75]  Alan L. Yuille,et al.  UnrealCV: Connecting Computer Vision to Unreal Engine , 2016, ECCV Workshops.

[76]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[77]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  A. Vedaldi,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Eric Lengyel Volumetric Hierarchical Approximate Convex Decomposition , 2016 .

[81]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[84]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[85]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[86]  M. Cugmas,et al.  On comparing partitions , 2015 .

[87]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[88]  Tae-Yong Kim,et al.  Unified particle physics for real-time applications , 2014, ACM Trans. Graph..

[89]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Remi Arnaud,et al.  glTF: Designing an Open-Standard Runtime Asset Format , 2014 .

[91]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[92]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[93]  Bo Han,et al.  TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[94]  Nadia Hashem,et al.  "I" and "others" , 2013 .

[95]  ImageNet Classification with Deep Convolutional Neural , 2013 .

[96]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[97]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[98]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[99]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[100]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[101]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[102]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[103]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[104]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[105]  J. Sethian,et al.  FRONTS PROPAGATING WITH CURVATURE DEPENDENT SPEED: ALGORITHMS BASED ON HAMILTON-JACOB1 FORMULATIONS , 2003 .

[106]  Ingemar J. Cox,et al.  "Ratio regions": a technique for image segmentation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[107]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[108]  David J. Fleet,et al.  Performance of optical flow techniques , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.