论文信息 - Deep Forward and Inverse Perceptual Models for Tracking and Prediction

Deep Forward and Inverse Perceptual Models for Tracking and Prediction

We consider the problems of learning forward models that map state to high-dimensional images and inverse models that map high-dimensional images to state in robotics. Specifically, we present a perceptual model for generating video frames from state with deep networks, and provide a framework for its use in tracking and prediction tasks. We show that our proposed model greatly outperforms standard deconvolutional methods and GANs for image generation, producing clear, photo-realistic images. We also develop a convolutional neural network model for state estimation and compare the result to an Extended Kalman Filter to estimate robot trajectories. We validate all models on a real robotic system.

[1] Peter Corke,et al. VISUAL CONTROL OF ROBOT MANIPULATORS – A REVIEW , 1993 .

[2] David M. Mount,et al. It's okay to be skinny, if your friends are fat , 1999 .

[3] Stefan Schaal,et al. Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[4] Stefan Schaal,et al. Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[5] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[6] Christian S. Jensen,et al. Nearest neighbor and reverse nearest neighbor queries for moving objects , 2002, Proceedings International Database Engineering and Applications Symposium.

[7] Yiannis Demiris,et al. Learning Forward Models for Robots , 2005, IJCAI.

[8] François Chaumette,et al. Visual servo control. I. Basic approaches , 2006, IEEE Robotics & Automation Magazine.

[9] Rachid Deriche,et al. Symmetrical Dense Optical Flow Estimation with Occlusions Detection , 2002, International Journal of Computer Vision.

[10] Duy Nguyen-Tuong,et al. Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[11] Janusz Konrad,et al. Occlusion-Aware Optical Flow Estimation , 2008, IEEE Transactions on Image Processing.

[12] Cordelia Schmid,et al. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13] C. K. Liu,et al. A Quick Tutorial on Multibody Dynamics , 2012 .

[14] Stefan Ulbrich,et al. Learning robot dynamics with Kinematic Bézier Maps , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15] José Santos-Victor,et al. An online algorithm for simultaneously learning forward and inverse kinematics , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16] Byron Boots,et al. Learning predictive models of a depth camera & manipulator from raw execution traces , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[18] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21] Thomas Brox,et al. Single-view to Multi-view: Reconstructing Unseen Views with a Convolutional Network , 2015, ArXiv.

[22] Jon Gauthier. Conditional generative adversarial nets for convolutional face generation , 2015 .

[23] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[27] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.

[28] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[29] Victor S. Lempitsky,et al. DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[30] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[31] Thomas Brox,et al. Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.