Exploring convolutional networks for end-to-end visual servoing

Present image based visual servoing approaches rely on extracting hand crafted visual features from an image. Choosing the right set of features is important as it directly affects the performance of any approach. Motivated by recent breakthroughs in performance of data driven methods on recognition and localization tasks, we aim to learn visual feature representations suitable for servoing tasks in unstructured and unknown environments. In this paper, we present an end-to-end learning based approach for visual servoing in diverse scenes where the knowledge of camera parameters and scene geometry is not available a priori. This is achieved by training a convolutional neural network over color images with synchronised camera poses. Through experiments performed in simulation and on a quadrotor, we demonstrate the efficacy and robustness of our approach for a wide range of camera poses in both indoor as well as outdoor environments.

[1]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  François Chaumette,et al.  Visual servo control. II. Advanced approaches [Tutorial] , 2007, IEEE Robotics & Automation Magazine.

[5]  Christophe Collewet,et al.  Photometric Visual Servoing , 2011, IEEE Transactions on Robotics.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  James J. Kuffner,et al.  OpenRAVE: A Planning Architecture for Autonomous Robotics , 2008 .

[9]  Christophe Collewet,et al.  Using image gradient as a visual feature for visual servoing , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Roberto Cipolla,et al.  SceneNet: An annotated model generator for indoor scene understanding , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[12]  Martin A. Riedmiller,et al.  Acquiring visual servoing reaching and grasping skills using neural reinforcement learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[13]  François Chaumette,et al.  Potential problems of stability and convergence in image-based and position-based visual servoing , 1997 .

[14]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[17]  S. Hutchinson,et al.  Visual Servo Control Part II : Advanced Approaches , 2007 .

[18]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  K. Madhava Krishna,et al.  Servoing across object instances: Visual servoing for object category , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Paolo Valigi,et al.  Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation , 2016, IEEE Robotics and Automation Letters.

[21]  Francois Chaumette,et al.  Potential problems of unstability and divergence in image-based and position-based visual servoing , 1999, 1999 European Control Conference (ECC).

[22]  Roland Memisevic,et al.  Learning Visual Odometry with a Convolutional Network , 2015, VISAPP.

[23]  François Chaumette,et al.  Visual servo control. I. Basic approaches , 2006, IEEE Robotics & Automation Magazine.

[24]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Ben Glocker,et al.  Real-time RGB-D camera relocalization , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[27]  Illah R. Nourbakhsh,et al.  Techniques for evaluating optical flow for visual odometry in extreme terrain , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).