Semantically Controllable Generation of Physical Scenes with Explicit Knowledge

Deep Generative Models (DGMs) are known for their superior capability in generating realistic data. Extending purely data-driven approaches, recent specialized DGMs satisfy additional controllable requirements such as embedding a traffic sign in a driving scene by manipulating patterns implicitly in the neuron or feature level. In this paper, we introduce a novel method to incorporate domain knowledge explicitly in the generation process to achieve the semantically controllable generation of physical scenes. We first categorize our knowledge into two types, the property of objects and the relationship among objects, to be consistent with the composition of natural scenes. We then propose a tree-structured generative model to learn hierarchical scene representation, whose nodes and edges naturally corresponded to the two types of knowledge, respectively. Consequently, explicit knowledge integration enables semantically controllable generation by imposing semantic rules on the properties of nodes and edges in the tree structure. We construct a synthetic example to illustrate the controllability and explainability of our method in a succinct setting. We further extend the synthetic example to realistic environments for autonomous vehicles and conduct extensive experiments: our method efficiently identifies adversarial physical scenes against different state-of-the-art 3D point cloud segmentation models, and satisfies the traffic rules specified as the explicit knowledge.

[1]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[2]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[3]  Chong Xiang,et al.  Generating 3D Adversarial Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[6]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[7]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[8]  Daniel Cohen-Or,et al.  GRAINS , 2018, ACM Trans. Graph..

[9]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[10]  Tomas Akenine-Möller,et al.  Fast, minimum storage ray/triangle intersection , 1997, J. Graphics, GPU, & Game Tools.

[11]  Regina Barzilay,et al.  Hierarchical Generation of Molecular Graphs using Structural Motifs , 2020, ICML.

[12]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[13]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining , 2020, NeurIPS.

[15]  Zhitao Gong,et al.  Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Peter A. Flach First-Order Logic , 2018, Encyclopedia of Machine Learning.

[17]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[19]  Oleksandr Polozov,et al.  Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" , 2020, ICML.

[20]  Xinge Zhu,et al.  Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation , 2020, ArXiv.

[21]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[23]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[24]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Stanley T. Birchfield,et al.  Structured Domain Randomization: Bridging the Reality Gap by Context-Aware Synthetic Data , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[26]  Sanja Fidler,et al.  Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation , 2020, ECCV.

[27]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[28]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[29]  Fei Deng,et al.  Generative Scene Graph Networks , 2021, ICLR.

[30]  D. T. Lee,et al.  Two algorithms for constructing a Delaunay triangulation , 1980, International Journal of Computer & Information Sciences.

[31]  Sanja Fidler,et al.  Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[33]  Xiaojie Guo,et al.  A Systematic Survey on Deep Generative Models for Graph Generation , 2020, ArXiv.

[34]  Raquel Urtasun,et al.  SceneGen: Learning to Generate Realistic Traffic Scenes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Shai Avidan,et al.  Geometric Adversarial Attacks and Defenses on 3D Point Clouds , 2020, 2021 International Conference on 3D Vision (3DV).

[36]  Chun-Liang Li,et al.  Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer , 2018, ICLR.

[37]  Chenxi Liu,et al.  Adversarial Attacks Beyond the Image Space , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Z. Dienes,et al.  A theory of implicit and explicit knowledge , 1999, Behavioral and Brain Sciences.

[39]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[40]  Qi Alfred Chen,et al.  Towards Robust LiDAR-based Perception in Autonomous Driving: General Black-box Adversarial Sensor Attack and Countermeasures , 2020, USENIX Security Symposium.

[41]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[42]  Raquel Urtasun,et al.  LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Peter Wonka,et al.  Generative Layout Modeling using Constraint Graphs , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Wenhao Ding,et al.  Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation , 2020, IEEE Robotics and Automation Letters.

[45]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[46]  W. Hartt,et al.  Data-driven physics-informed constitutive metamodeling of complex fluids: A multifidelity neural network (MFNN) framework , 2021 .

[47]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[48]  Qi Alfred Chen,et al.  On the Adversarial Robustness of 3D Point Cloud Classification , 2020, ArXiv.

[49]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[52]  Philip David,et al.  PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[54]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[55]  Jianfei Cai,et al.  Scene Graph Generation With External Knowledge and Image Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yibo Yang,et al.  Physics-informed deep generative models , 2018, ArXiv.

[57]  C'eline Hudelot,et al.  Controlling generative models with continuous factors of variations , 2020, ICLR.

[58]  Rabab Ward,et al.  Towards Universal Physical Attacks On Cascaded Camera-Lidar 3d Object Detection Models , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[59]  Andrew Gordon Wilson,et al.  Simple Black-box Adversarial Attacks , 2019, ICML.

[60]  Bichen Wu,et al.  SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation , 2020, ECCV.

[61]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[62]  Raquel Urtasun,et al.  Physically Realizable Adversarial Examples for LiDAR Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Martin Pelikan,et al.  Bayesian Optimization Algorithm , 2005 .

[64]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[65]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[66]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[67]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  Charles Audet,et al.  Derivative-Free and Blackbox Optimization , 2017 .

[70]  Bo Li,et al.  MeshAdv: Adversarial Meshes for Visual Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).