Deep Forward and Inverse Perceptual Models for Tracking and Prediction

We consider the problems of learning forward models that map state to high-dimensional images and inverse models that map high-dimensional images to state in robotics. Specifically, we present a perceptual model for generating video frames from state with deep networks, and provide a framework for its use in tracking and prediction tasks. We show that our proposed model greatly outperforms standard deconvolutional methods and GANs for image generation, producing clear, photo-realistic images. We also develop a convolutional neural network model for state estimation and compare the result to an Extended Kalman Filter to estimate robot trajectories. We validate all models on a real robotic system.

[1]  Peter Corke,et al.  VISUAL CONTROL OF ROBOT MANIPULATORS – A REVIEW , 1993 .

[2]  David M. Mount,et al.  It's okay to be skinny, if your friends are fat , 1999 .

[3]  Stefan Schaal,et al.  Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[4]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[5]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[6]  Christian S. Jensen,et al.  Nearest neighbor and reverse nearest neighbor queries for moving objects , 2002, Proceedings International Database Engineering and Applications Symposium.

[7]  Yiannis Demiris,et al.  Learning Forward Models for Robots , 2005, IJCAI.

[8]  François Chaumette,et al.  Visual servo control. I. Basic approaches , 2006, IEEE Robotics & Automation Magazine.

[9]  Rachid Deriche,et al.  Symmetrical Dense Optical Flow Estimation with Occlusions Detection , 2002, International Journal of Computer Vision.

[10]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[11]  Janusz Konrad,et al.  Occlusion-Aware Optical Flow Estimation , 2008, IEEE Transactions on Image Processing.

[12]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  C. K. Liu,et al.  A Quick Tutorial on Multibody Dynamics , 2012 .

[14]  Stefan Ulbrich,et al.  Learning robot dynamics with Kinematic Bézier Maps , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  José Santos-Victor,et al.  An online algorithm for simultaneously learning forward and inverse kinematics , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Byron Boots,et al.  Learning predictive models of a depth camera & manipulator from raw execution traces , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[18]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Thomas Brox,et al.  Single-view to Multi-view: Reconstructing Unseen Views with a Convolutional Network , 2015, ArXiv.

[22]  Jon Gauthier Conditional generative adversarial nets for convolutional face generation , 2015 .

[23]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[27]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[28]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29]  Victor S. Lempitsky,et al.  DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[30]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[31]  Thomas Brox,et al.  Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.