Semantically Controllable Scene Generation with Guidance of Explicit Knowledge

Deep Generative Models (DGMs) are known for their superior capability in generating realistic data. Extending purely data-driven approaches, recent specialized DGMs may satisfy additional controllable requirements such as embedding a traffic sign in a driving scene, by manipulating patterns implicitly in the neuron or feature level. In this paper, we introduce a novel method to incorporate domain knowledge explicitly in the generation process to achieve semantically controllable scene generation. We categorize our knowledge into two types to be consistent with the composition of natural scenes, where the first type represents the property of objects and the second type represents the relationship among objects. We then propose a tree-structured generative model to learn complex scene representation, whose nodes and edges are naturally corresponding to the two types of knowledge respectively. Knowledge can be explicitly integrated to enable semantically controllable scene generation by imposing semantic rules on properties of nodes and edges in the tree structure. We construct a synthetic example to illustrate the controllability and explainability of our method in a clean setting. We further extend the synthetic example to realistic autonomous vehicle driving environments and conduct extensive experiments to show that our method efficiently identifies adversarial traffic scenes against different state-of-the-art 3D point cloud segmentation models satisfying the traffic rules specified as the explicit knowledge.

[1]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining , 2020, NeurIPS.

[3]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[4]  Wenshuo Wang,et al.  A New Multi-vehicle Trajectory Generator to Simulate Vehicle-to-Vehicle Encounters , 2018 .

[5]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Chong Xiang,et al.  Generating 3D Adversarial Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Charles Audet,et al.  Derivative-Free and Blackbox Optimization , 2017 .

[8]  Peter Wonka,et al.  Generative Layout Modeling using Constraint Graphs , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Wenhao Ding,et al.  Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation , 2020, IEEE Robotics and Automation Letters.

[10]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[11]  Bo Li,et al.  MeshAdv: Adversarial Meshes for Visual Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  W. Hartt,et al.  Data-driven physics-informed constitutive metamodeling of complex fluids: A multifidelity neural network (MFNN) framework , 2021 .

[13]  Yibo Yang,et al.  Physics-informed deep generative models , 2018, ArXiv.

[14]  Wenhao Ding,et al.  Learning to Collide: An Adaptive Safety-Critical Scenarios Generating Method , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Sanja Fidler,et al.  Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Shai Avidan,et al.  Geometric Adversarial Attacks and Defenses on 3D Point Clouds , 2020, 2021 International Conference on 3D Vision (3DV).

[18]  Xiaojie Guo,et al.  A Systematic Survey on Deep Generative Models for Graph Generation , 2020, ArXiv.

[19]  Wenhao Ding,et al.  CMTS: A Conditional Multiple Trajectory Synthesizer for Generating Safety-Critical Driving Scenarios , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Regina Barzilay,et al.  Hierarchical Generation of Molecular Graphs using Structural Motifs , 2020, ICML.

[21]  Raquel Urtasun,et al.  SceneGen: Learning to Generate Realistic Traffic Scenes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sanja Fidler,et al.  Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation , 2020, ECCV.

[23]  Chun-Liang Li,et al.  Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer , 2018, ICLR.

[24]  Bichen Wu,et al.  SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation , 2020, ECCV.

[25]  Daniel Cohen-Or,et al.  GRAINS , 2018, ACM Trans. Graph..

[26]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[27]  Zhitao Gong,et al.  Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  C'eline Hudelot,et al.  Controlling generative models with continuous factors of variations , 2020, ICLR.

[29]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[30]  Fei Deng,et al.  Generative Scene Graph Networks , 2021, ICLR.

[31]  Tie-Yan Liu,et al.  Neural Architecture Optimization , 2018, NeurIPS.

[32]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[34]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[35]  Z. Dienes,et al.  A theory of implicit and explicit knowledge , 1999, Behavioral and Brain Sciences.

[36]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[37]  Qi Alfred Chen,et al.  Towards Robust LiDAR-based Perception in Autonomous Driving: General Black-box Adversarial Sensor Attack and Countermeasures , 2020, USENIX Security Symposium.

[38]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[39]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[40]  Raquel Urtasun,et al.  Physically Realizable Adversarial Examples for LiDAR Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Martin Pelikan,et al.  Bayesian Optimization Algorithm , 2005 .

[42]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[43]  Simon Lucey,et al.  Argoverse: 3D Tracking and Forecasting With Rich Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Cyrill Stachniss,et al.  SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[46]  Stanley T. Birchfield,et al.  Structured Domain Randomization: Bridging the Reality Gap by Context-Aware Synthetic Data , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[47]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[48]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[49]  Jiajun Wu,et al.  Neural Scene De-rendering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[51]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[52]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[53]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[54]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[55]  Qi Alfred Chen,et al.  On the Adversarial Robustness of 3D Point Cloud Classification , 2020, ArXiv.

[56]  Jianfei Cai,et al.  Scene Graph Generation With External Knowledge and Image Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Oleksandr Polozov,et al.  Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" , 2020, ICML.

[58]  Peter Wonka,et al.  Image2StyleGAN++: How to Edit the Embedded Images? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[60]  Chenxi Liu,et al.  Adversarial Attacks Beyond the Image Space , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Tomas Akenine-Möller,et al.  Fast, minimum storage ray/triangle intersection , 1997, J. Graphics, GPU, & Game Tools.

[62]  Peter A. Flach First-Order Logic , 2018, Encyclopedia of Machine Learning.

[63]  Xinge Zhu,et al.  Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation , 2020, ArXiv.

[64]  Rabab Ward,et al.  Towards Universal Physical Attacks On Cascaded Camera-Lidar 3d Object Detection Models , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[65]  Andrew Gordon Wilson,et al.  Simple Black-box Adversarial Attacks , 2019, ICML.

[66]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Peter Wonka,et al.  Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[68]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[69]  Philip David,et al.  PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).