An Active Vision System for Detecting, Fixating and Manipulating Objects in the Real World

The ability to autonomously acquire new knowledge through interaction with the environment is an important research topic in the field of robotics. The knowledge can only be acquired if suitable perception— action capabilities are present: a robotic system has to be able to detect, attend to and manipulate objects in its surrounding. In this paper, we present the results of our long-term work in the area of vision-based sensing and control. The work on finding, attending, recognizing and manipulating objects in domestic environments is studied. We present a stereo-based vision system framework where aspects of top-down and bottom-up attention as well as foveated attention are put into focus and demonstrate how the system can be utilized for robotic object grasping.

[1]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[2]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Giulio Sandini,et al.  An anthropomorphic retina-like structure for scene analysis , 1980 .

[4]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[5]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[6]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[7]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[8]  Roger Y. Tsai,et al.  Real time versatile robotics hand/eye calibration using 3D machine vision , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[9]  Yiu Cheung Shiu,et al.  Calibration of wrist-mounted robotic sensors by solving homogeneous transform equations of the form AX=XB , 1989, IEEE Trans. Robotics Autom..

[10]  M. A. Fischler,et al.  Context-based vision: Recognition of natural scenes , 1989, Twenty-Third Asilomar Conference on Signals, Systems and Computers, 1989..

[11]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[12]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[13]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[14]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[15]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[16]  Yiming Ye,et al.  Sensor planning in 3d object search: its formu-lation and complexity , 1995 .

[17]  Yasuo Kuniyoshi,et al.  A foveated wide angle lens for active vision , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[18]  Karun B. Shimoga,et al.  Robot Grasp Synthesis Algorithms: A Survey , 1996, Int. J. Robotics Res..

[19]  Arnold W. M. Smeulders,et al.  Color Based Object Recognition , 1997, ICIAP.

[20]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[21]  Roderic A. Grupen,et al.  A Control Basis for Haptically-Guided Grasping and Manipulation , 1998 .

[22]  Shimon Edelman,et al.  Learning visually guided grasping: a test case in sensorimotor learning , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[23]  John K. Tsotsos,et al.  The ARK project: Autonomous mobile robots for known industrial environments , 1998, Robotics Auton. Syst..

[24]  Brian Scassellati A Binocular, Foveated Active Vision System , 1998 .

[25]  Joachim Denzler,et al.  Active Knowledge-Based Scene Analysis , 1999, ICVS.

[26]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[27]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[28]  Joachim Denzler,et al.  Active KnowledgeóBased Scene Analysis , 2000 .

[29]  C. Koch,et al.  Models of bottom-up and top-down visual attention , 2000 .

[30]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[31]  Aaron Sloman,et al.  Evolvable Biologically Plausible Visual Architectures , 2001, BMVC.

[32]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[33]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[34]  G. Humphreys,et al.  Detection by action: neuropsychological evidence for action-defined templates in search , 2001, Nature Neuroscience.

[35]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[36]  Justus H. Piater,et al.  Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot , 2001, Robotics Auton. Syst..

[37]  Antonio Morales,et al.  Heuristic vision-based computation of planar antipodal grasps on unknown objects , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[38]  Stefan Schaal,et al.  Overt visual attention for a humanoid robot , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[39]  Jun Saiki,et al.  Stochastic Guided Search Model for Search Asymmetries in Visual Search Tasks , 2002, Biologically Motivated Computer Vision.

[40]  N. Sigala,et al.  Visual categorization shapes feature selectivity in the primate temporal cortex , 2002, Nature.

[41]  Lars Petersson,et al.  Systems integration for real-world manipulation tasks , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[42]  Zhaoping Li A saliency map in primary visual cortex , 2002, Trends in Cognitive Sciences.

[43]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Robert Platt,et al.  Nullspace composition of control laws for grasping , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45]  Rodney A. Brooks,et al.  Humanoid robots , 2002, CACM.

[46]  Jan-Olof Eklundh,et al.  Real-Time Epipolar Geometry Estimation of Binocular Stereo Heads , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[48]  Yoshiro Imai,et al.  Development of a high-speed multifingered hand system and its application to catching , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[49]  G. Humphreys,et al.  Seeing the action: neuropsychological evidence for action-based effects on object selection , 2003, Nature Neuroscience.

[50]  B. Draper,et al.  Evaluation of Selective Attention under Similarity Transforms , 2003 .

[51]  Helge J. Ritter,et al.  Integrating Context-Free and Context-Dependent Attentional Mechanisms for Gestural Object Reference , 2003, ICVS.

[52]  Ian D. Reid,et al.  Transfer of Fixation Using Affine Structure: Extending the Analysis to Stereo , 2004, International Journal of Computer Vision.

[53]  Xing Xie,et al.  Salient Region Detection Using Weighted Feature Maps Based on the Human Visual Attention Model , 2004, PCM.

[54]  Danica Kragic,et al.  Interactive grasp learning based on human demonstration , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[55]  Antonio Morales,et al.  Using Experience for Assessing Grasp Reliability , 2004, Int. J. Humanoid Robotics.

[56]  Danica Kragic,et al.  An interactive interface for service robots , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[57]  Danica Kragic,et al.  Combination of foveal and peripheral vision for object recognition and pose estimation , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[58]  Sangmoon Choi Biologically Motivated Visual Attention System Using Bottom-up Saliency Map and Top-down Inhibition , 2004 .

[59]  Henrik I. Christensen,et al.  Object detection using background context , 2004, ICPR 2004.

[60]  Danica Kragic,et al.  Receptive field cooccurrence histograms for object detection , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[61]  Danica Kragic,et al.  Vision for robotic object manipulation in domestic settings , 2005, Robotics Auton. Syst..

[62]  Bruce A. Draper,et al.  Evaluation of selective attention under similarity transformations , 2005, Comput. Vis. Image Underst..

[63]  U. Castiello The neuroscience of grasping , 2005, Nature Reviews Neuroscience.

[64]  Jan-Olof Eklundh,et al.  Vision in the real world: Finding, attending and recognizing objects , 2006, Int. J. Imaging Syst. Technol..

[65]  Tamim Asfour,et al.  ARMAR-III: An Integrated Humanoid Platform for Sensory-Motor Control , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[66]  Gordon Cheng,et al.  Foveated vision systems with two cameras per eye , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[67]  Christopher W. Geib,et al.  Object Action Complexes as an Interface for Planning and Robot Control , 2006 .

[68]  Danica Kragic,et al.  Initialization and System Modeling in 3-D Pose Tracking , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[69]  Danica Kragic,et al.  Object detection and mapping for service robot tasks , 2007, Robotica.

[70]  Laurent Itti,et al.  Biologically-inspired robotics vision monte-carlo localization in the outdoor environment , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[71]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[72]  Jan-Olof Eklundh,et al.  An Attentional System Combining Top-Down and Bottom-Up Influences , 2008, WAPCV.

[73]  Danica Kragic,et al.  Birth of the Object: Detection of Objectness and Extraction of Object Shape through Object-Action complexes , 2008, Int. J. Humanoid Robotics.

[74]  Danica Kragic,et al.  Minimum volume bounding box decomposition for shape approximation in robot grasping , 2008, 2008 IEEE International Conference on Robotics and Automation.

[75]  Gordon Cheng,et al.  Biologically Based Top-Down Attention Modulation for Humanoid Interactions , 2008, Int. J. Humanoid Robotics.

[76]  Danica Kragic,et al.  Integration of Visual and Shape Attributes for Object Action Complexes , 2008, ICVS.

[77]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[78]  Florentin Wörgötter,et al.  Cognitive agents - a procedural perspective relying on the predictability of Object-Action-Complexes (OACs) , 2009, Robotics Auton. Syst..

[79]  T. Duckett VOCUS : A Visual Attention System for Object Detection and Goal-directed Search , 2010 .

[80]  Vidhya Navalpakkam,et al.  Sharing Resources : Buy Attention , Get Object Recognition , 2022 .