State-Consistency Loss for Learning Spatial Perception Tasks From Partial Labels

When learning models for real-world robot spatial perception tasks, one might have access only to partial labels: this occurs for example in semi-supervised scenarios (in which labels are not available for a subset of the training instances) or in some types of self-supervised robot learning (where the robot autonomously acquires a labeled training set, but only acquires labels for a subset of the output variables in each instance). We introduce a general approach to deal with this class of problems using an auxiliary loss enforcing the expectation that the perceived environment state should not abruptly change; then, we instantiate the approach to solve two robot perception problems: a simulated ground robot learning long-range obstacle mapping as a 400-binary-label classification task in a self-supervised way in a static environment; and a real nano-quadrotor learning human pose estimation as a 3-variable regression task in a semi-supervised way in a dynamic environment. In both cases, our approach yields significant quantitative performance improvements (average increase of 6 AUC percentage points in the former; relative improvement of the $R^2$ metric ranging from 7% to 33% in the latter) over baselines.

[1]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[2]  Francesco Mondada,et al.  Bringing Robotics to Formal Education: The Thymio Open-Source Hardware Robot , 2017, IEEE Robotics & Automation Magazine.

[3]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Wolfram Burgard,et al.  Self-Supervised Visual Terrain Classification From Unsupervised Acoustic Feature Learning , 2019, IEEE Transactions on Robotics.

[5]  Christos-Savvas Bouganis,et al.  Learning to Fly by MySelf: A Self-Supervised CNN-Based Approach for Autonomous Navigation , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Junhong Xu,et al.  A Deep Residual convolutional neural network for facial keypoint detection with missing labels , 2018, Signal Process..

[7]  M. Nour Surfing Uncertainty: Prediction, Action, and the Embodied Mind. , 2017, British Journal of Psychiatry.

[8]  J. Flavell The Developmental psychology of Jean Piaget , 1963 .

[9]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Jiwen Lu,et al.  Conditional Single-View Shape Generation for Multi-View Stereo Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Luca Benini,et al.  Ultra Low Power Deep-Learning-powered Autonomous Nano Drones , 2018, ArXiv.

[13]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[14]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[15]  Jitendra Malik,et al.  Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[17]  Alessandro Giusti,et al.  Learning Long-Range Perception Using Self-Supervision From Short-Range Sensors and Odometry , 2018, IEEE Robotics and Automation Letters.

[18]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[19]  Luca Benini,et al.  Enabling the heterogeneous accelerator model on ultra-low power microcontroller platforms , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  J. Andrew Bagnell,et al.  Improving robot navigation through self‐supervised online learning , 2006, J. Field Robotics.

[21]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[22]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Krzysztof Walas,et al.  Where Should I Walk? Predicting Terrain Properties From Images Via Self-Supervised Learning , 2019, IEEE Robotics and Automation Letters.

[24]  Luca Maria Gambardella,et al.  Vision-based Control of a Quadrotor in User Proximity: Mediated vs End-to-End Learning Approaches , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Lu Fang,et al.  FlashFusion: Real-time Globally Consistent Dense 3D Reconstruction using CPU Computing , 2018, Robotics: Science and Systems.

[27]  Junqiang Xi,et al.  Self‐supervised learning to visually detect terrain surfaces for autonomous robots operating in forested terrain , 2012, J. Field Robotics.

[28]  Abhinav Gupta,et al.  Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Peter Fankhauser,et al.  ANYmal - a highly mobile and dynamic quadrupedal robot , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Roland Siegwart,et al.  Voxgraph: Globally Consistent, Volumetric Mapping Using Signed Distance Function Submaps , 2020, IEEE Robotics and Automation Letters.