Enabling Depth-Driven Visual Attention on the iCub Humanoid Robot: Instructions for Use and New Perspectives

Reliable depth perception eases and enables a large variety of attentional and interactive behaviors on humanoid robots. However, the use of depth in real scenarios is hindered by the difficulty of computing real-time and robust binocular disparity maps from moving stereo cameras. On the iCub humanoid robot we recently adopted the Efficient Large-scale Stereo (ELAS) Matching algorithm for computation of the disparity map. In this technical report we show that this algorithm allows reliable depth perception and experimental evidence that demonstrates that it can be used to solve challenging visual tasks in real-world, indoor settings. As a case study we consider the common situation where the robot is asked to focus the attention on one object close in the scene, showing how a simple but effective disparity-based segmentation solves the problem in this case. This example paves the way to a variety of other similar applications.

[1]  Lorenzo Rosasco,et al.  Teaching iCub to recognize objects using deep Convolutional Neural Networks , 2015, MLIS@ICML.

[2]  Heiko Wersing,et al.  A Biologically Motivated System for Unconstrained Online Learning of Visual Objects , 2006, ICANN.

[3]  Richard Szeliski,et al.  Efficient High-Resolution Stereo Matching Using Local Plane Sweeps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Mikhail Frank,et al.  Learning spatial object localization from vision on a humanoid robot , 2012 .

[5]  Julian Eggert,et al.  A multi-block-matching approach for stereo , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[6]  Paolo Dario,et al.  Real-Time 3D Stereo Tracking and Localizing of Spherical Objects with the iCub Robotic Platform , 2011, J. Intell. Robotic Syst..

[7]  Adrian Kaehler,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[8]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[9]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Edgar Körner,et al.  Online Learning for Bootstrapping of Object Recognition and Localization in a Biologically Motivated Architecture , 2008, ICVS.

[11]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Giulio Sandini,et al.  The iCub humanoid robot: an open platform for research in embodied cognition , 2008, PerMIS.

[13]  Giorgio Metta,et al.  Reexamining Lucas-Kanade method for real-time independent motion detection: Application to the iCub humanoid robot , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Jürgen Leitner,et al.  Autonomous learning of robust visual object detection and identification on a humanoid , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[15]  Pedro U. Lima,et al.  Tracking objects with generic calibrated sensors: An algorithm based on color and 3D shape features , 2010, Robotics Auton. Syst..

[16]  Alexandre Bernardino,et al.  A Binocular Stereo Algorithm for Log-Polar Foveated Systems , 2002, Biologically Motivated Computer Vision.

[17]  Giorgio Metta,et al.  On the impact of learning hierarchical representations for visual recognition in robotics , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Giorgio Metta,et al.  iCub World: Friendly Robots Help Building Good Vision Data-Sets , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Giorgio Metta,et al.  Three-finger precision grasp on incomplete 3D point clouds , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[20]  David Filliat,et al.  From passive to interactive object learning and recognition through self-identification on a humanoid robot , 2016, Auton. Robots.

[21]  Bastian Leibe,et al.  Close-Range Human Detection and Tracking for Head-Mounted Cameras , 2012, BMVC.

[22]  Heiko Wersing,et al.  Peripersonal space and object recognition for humanoids , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[23]  Dennis Mitzel Close-Range Human Detection for Head-Mounted Cameras , 2012 .

[24]  Julian Eggert,et al.  A Two-Stage Correlation Method for Stereoscopic Depth Estimation , 2010, 2010 International Conference on Digital Image Computing: Techniques and Applications.

[25]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Gert Kootstra,et al.  Learning and recognition of objects inspired by early cognition , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  HirschmullerHeiko Stereo Processing by Semiglobal Matching and Mutual Information , 2008 .

[28]  Alessandro Roncone,et al.  3D stereo estimation and fully automated learning of eye-hand coordination in humanoid robots , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[29]  Giorgio Metta,et al.  Weakly supervised strategies for natural object recognition in robotics , 2013, 2013 IEEE International Conference on Robotics and Automation.

[30]  Yunhui Liu,et al.  A wearable stereo vision system for visually impaired , 2012, 2012 IEEE International Conference on Mechatronics and Automation.

[31]  Luc Van Gool,et al.  Real-time stereo and flow-based video segmentation with superpixels , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[32]  Gregory D. Hager,et al.  Robust Object Tracking in Crowd Dynamic Scenes Using Explicit Stereo Depth , 2012, ACCV.

[33]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[34]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Heiko Wersing,et al.  Online Learning of Objects in a Biologically Motivated Visual Architecture , 2007, Int. J. Neural Syst..

[36]  Jürgen Leitner,et al.  A benchmark on stereo disparity estimation for humanoid robots , 2008 .

[37]  Federico Tombari,et al.  Evaluation of stereo algorithms for 3D object recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[38]  Lorenzo Natale,et al.  Object segmentation using independent motion detection , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[39]  Julian Eggert,et al.  Stereo image warping for improved depth estimation of road surfaces , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[40]  Bastian Leibe,et al.  Tracking People and Their Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Heiko Wersing,et al.  Biologically motivated visual behaviors for humanoids: Learning to interact and learning in interaction , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.