SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments

Recent advancements in deep learning, computer vision, and embodied AI have given rise to synthetic causal reasoning video datasets. These datasets facilitate the development of AI algorithms that can reason about physical interactions between objects. However, datasets thus far have primarily focused on elementary physical events such as rolling or falling. There is currently a scarcity of datasets that focus on the physical interactions that humans perform daily with objects in the real world. To address this scarcity, we introduce SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments. The SPACE simulator allows us to generate the SPACE dataset, a synthetic video dataset in a 3D environment, to systematically evaluate physics-based models on a range of physical causal reasoning tasks. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact. These events make up the vast majority of the basic physical interactions between objects. We then further evaluate it with a state-of-the-art physics-based deep model and show that the SPACE dataset improves the learning of intuitive physics with an approach inspired by curriculum learning. Repository: https://github.com/jiafei1224/SPACE

[1]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[3]  Deva Ramanan,et al.  CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , 2020, ICLR.

[4]  A. Meer,et al.  Timing strategies used in defensive blinking to optical collisions in 5- to 7-month-old infants , 2000 .

[5]  Cheston Tan,et al.  A Survey of Embodied AI: From Simulators to Research Tasks , 2021, IEEE Transactions on Emerging Topics in Computational Intelligence.

[6]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[7]  Susan J. Hespos,et al.  Infants' Knowledge About Occlusion and Containment Events: A Surprising Discrepancy , 2001, Psychological science.

[8]  Andrew S. Gordon,et al.  Commonsense Interpretation of Triangle Behavior , 2016, AAAI.

[9]  Chuang Gan,et al.  CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.

[10]  Juan Carlos Niebles,et al.  Peeking Into the Future: Predicting Future Person Activities and Locations in Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[12]  C. Lawrence Zitnick,et al.  Adopting Abstract Images for Semantic Scene Understanding , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Amy Needham,et al.  Object Segregation in Infancy , 1992 .

[14]  M. Schmuckler,et al.  Looming Responses to Obstacles and Apertures: The Role of Accretion and Deletion of Background Texture , 1996 .

[15]  Anik De Ribaupierre,et al.  Piaget's Theory of Cognitive Development , 2015 .

[16]  Minh Vo,et al.  Long-term Human Motion Prediction with Scene Context , 2020, ECCV.

[17]  Jean Piaget Piaget’s Theory , 1976 .

[18]  Leif Kobbelt,et al.  A survey of point-based techniques in computer graphics , 2004, Comput. Graph..

[19]  Joshua B. Tenenbaum,et al.  PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception , 2021, AAAI.

[20]  R. Baillargeon A model of physical reasoning in infancy , 1995 .

[21]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[22]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[23]  Christian Wolf,et al.  COPHY: Counterfactual Learning of Physical Dynamics , 2020, ICLR.

[24]  So Kanazawa,et al.  Asymmetry for the perception of expansion/contraction in infancy ☆ , 2004 .

[25]  Hui Li Tan,et al.  Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[26]  A. Premack,et al.  Causal cognition : a multidisciplinary debate , 1996 .

[27]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[28]  Yun Fu,et al.  Human Action Recognition and Prediction: A Survey , 2018, International Journal of Computer Vision.

[29]  Nicolas Thome,et al.  Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shenghua Gao,et al.  Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Chuang Gan,et al.  AGENT: A Benchmark for Core Psychological Reasoning , 2021, ICML.

[32]  J. Piaget The construction of reality in the child , 1954 .

[33]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[34]  Picturing objects in infancy. , 2014, Child development.