An overview of robot vision

Robot vision is an interdisciplinary field that deals with how robots can be made to gain high-level understanding from digital images or videos. Understanding an image at the pixel level often does not provide enough information for decision making and action taking. In this case, higher level semantic information that describes the image is required. This helps the robot to accomplish complex tasks that require visual understanding.For robots to add value they need to be sufficiently effective at executing tasks in different settings. Despite many impressive advances in robot vision, robots still lack the ability to function as humans do in complex environments. Importantly, this includes being able to interpret and understand the perceptual complexities of the world.Robot vision is dependant on ideas from both computer vision and machine learning. In this paper we provide a overview of the advances in these disciplines and how they contribute to robot vision.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[3]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  F. Binkofski,et al.  Can Machines Think? Interaction and Perspective Taking with Robots Investigated via fMRI , 2008, PloS one.

[5]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Kurt Konolige,et al.  FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping , 2008, IEEE Transactions on Robotics.

[7]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tom Duckett,et al.  Mini-SLAM: Minimalistic Visual SLAM in Large-Scale Environments Based on a New Interpretation of Image Similarity , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[10]  David A. Forsyth,et al.  Detecting Anomalous Faces with 'No Peeking' Autoencoders , 2018, ArXiv.

[11]  Zhao Li,et al.  Visual place recognition for multi-robots maps merging , 2012, 2012 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[12]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.

[13]  William D. Smart,et al.  Robots for humanity: using assistive robotics to empower people with disabilities , 2013, IEEE Robotics & Automation Magazine.

[14]  Sanjay Chawla,et al.  Anomaly Detection using One-Class Neural Networks , 2018, ArXiv.

[15]  Chu Kiong Loo,et al.  SLAMM: Visual monocular SLAM with continuous mapping using multiple maps , 2018, PloS one.

[16]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Trevor Darrell,et al.  A geometric approach to robotic laundry folding , 2012, Int. J. Robotics Res..

[19]  Andreas Geiger,et al.  Joint 3D Object and Layout Inference from a Single RGB-D Image , 2015, GCPR.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Bo Wang,et al.  Image captioning based on deep reinforcement learning , 2018, ICIMCS '18.

[22]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael J. Black,et al.  Optical Flow with Semantic Segmentation and Localized Layers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Javier González,et al.  Creating Metric-topological Maps for Large-scale Monocular SLAM , 2013, ICINCO.

[28]  Michael Milford,et al.  Condition-invariant, top-down visual place recognition , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[30]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[32]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[33]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[35]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[36]  Wolfram Burgard,et al.  An evaluation of the RGB-D SLAM system , 2012, 2012 IEEE International Conference on Robotics and Automation.

[37]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[40]  Albert S. Huang,et al.  Visual Odometry and Mapping for Autonomous Flight Using an RGB-D Camera , 2011, ISRR.

[41]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[42]  Shengyong Chen,et al.  Active vision in robotic systems: A survey of recent developments , 2011, Int. J. Robotics Res..

[43]  Michael Burke,et al.  User-driven mobile robot storyboarding: Learning image interest and saliency from pairwise image comparisons , 2017, ArXiv.

[44]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Amit Kumar Mondal,et al.  Obstacle avoidance for mobile robot using RGB-D camera , 2017, 2017 International Conference on Intelligent Sustainable Systems (ICISS).

[46]  Chaoyang Zhu,et al.  Place recognition: An Overview of Vision Perspective , 2017, Applied Sciences.

[47]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[48]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[50]  B. Ravi Kiran,et al.  An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos , 2018, J. Imaging.

[51]  Martin Lauer,et al.  3D Traffic Scene Understanding From Movable Platforms , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[56]  Darius Burschka,et al.  V-GPS(SLAM): vision-based inertial system for mobile robots , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[57]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[58]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[59]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[60]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[61]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[62]  Andrew Calway,et al.  Visual mapping using learned structural priors , 2013, 2013 IEEE International Conference on Robotics and Automation.

[63]  Peter V. Gehler,et al.  Efficient 2D and 3D Facade Segmentation Using Auto-Context , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Evgeny Burnaev,et al.  Reinforcement learning in computer vision , 2018, International Conference on Machine Vision.

[65]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[66]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.