Feedback from Pixels: Output Regulation via Learning-based Scene View Synthesis

We propose a novel controller synthesis involving feedback from pixels, whereby the measurement is a high dimensional signal representing a pixelated image with RedGreen-Blue (RGB) values. The approach neither requires feature extraction, nor object detection, nor visual correspondence. The control policy does not involve the estimation of states or similar latent representations. Instead, tracking is achieved directly in image space, with a model of the reference signal embedded as required by the internal model principle. The reference signal is generated by a neural network with learningbased scene view synthesis capabilities. Our approach does not require an end-to-end learning of a pixel-to-action control policy. The approach is applied to a motion control problem, namely the longitudinal dynamics of a car-following problem. We show how this approach lend itself to a tractable stability analysis with associated bounds critical to establishing trustworthiness and interpretability of the closed-loop dynamics.

[1]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[2]  Sergey Levine,et al.  SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[3]  Guy Rosman,et al.  Variational Autoencoder for End-to-End Control of Autonomous Driving with Novelty Detection and Training De-biasing , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Silvio Savarese,et al.  GONet: A Semi-Supervised Deep Learning Approach For Traversability Estimation , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[6]  P. Olver Nonlinear Systems , 2013 .

[7]  M. Athans,et al.  On the optimal error regulation of a string of moving vehicles , 1966 .

[8]  K. Madhava Krishna,et al.  Exploring convolutional networks for end-to-end visual servoing , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Ali Ghodsi,et al.  Robust Locally-Linear Controllable Embedding , 2017, AISTATS.

[10]  Satoru Sakai,et al.  Visual Systems & Control on Polynomial Space and Its Application to Sloshing Problems , 2014, IEEE Transactions on Control Systems Technology.

[11]  Alessandro Astolfi,et al.  Static output feedback stabilization: from linear to nonlinear and back , 2001 .

[12]  Alessandro Astolfi,et al.  A hamilton-jacobi setup for the static output feedback stabilization of nonlinear systems , 2002, IEEE Transactions on Automatic Control.

[13]  Silvio Savarese,et al.  Deep Visual MPC-Policy Learning for Navigation , 2019, IEEE Robotics and Automation Letters.

[14]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[15]  Silvio Savarese,et al.  VUNet: Dynamic Scene View Synthesis for Traversability Estimation Using an RGB Camera , 2018, IEEE Robotics and Automation Letters.

[16]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[17]  Frank L. Lewis,et al.  Necessary and sufficient conditions for H-infinity static output-feedback control , 2005 .

[18]  Benjamin Recht,et al.  Certainty Equivalent Perception-Based Control , 2020, L4DC.

[19]  Carlos E. de Souza,et al.  A necessary and sufficient condition for output feedback stabilizability , 1995, Autom..

[20]  Nikolai Matni,et al.  Robust Guarantees for Perception-Based Control , 2019, L4DC.

[21]  Silvio Savarese,et al.  Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation , 2020, CoRL.

[22]  Aaron D. Ames,et al.  Guaranteeing Safety of Learned Perception Modules via Measurement-Robust Control Barrier Functions , 2020, CoRL.

[23]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[24]  Daniela Rus,et al.  Learning Robust Control Policies for End-to-End Autonomous Driving From Data-Driven Simulation , 2020, IEEE Robotics and Automation Letters.

[25]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[26]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[27]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jitendra Malik,et al.  Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .

[29]  Satoru Sakai,et al.  A visual feedback design on matrix space for a liquid sloshing experiment , 2013, The SICE Annual Conference 2013.

[30]  Russ Tedrake,et al.  The Surprising Effectiveness of Linear Models for Visual Foresight in Object Pile Manipulation , 2020, WAFR.

[31]  Satoru Sakai,et al.  On the visual systems & control on matrix space , 2014, 53rd IEEE Conference on Decision and Control.

[32]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[33]  Christophe Collewet,et al.  Photometric Visual Servoing , 2011, IEEE Transactions on Robotics.

[34]  Joshua B. Tenenbaum,et al.  Efficient inverse graphics in biological face processing , 2018, Science Advances.

[35]  Peter I. Corke,et al.  Training Deep Neural Networks for Visual Servoing , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).