A hierarchical active binocular robot vision architecture for scene exploration and object appearance learning

This thesis presents an investigation of a computational model of hierarchical visual behaviours within an active binocular robot vision architecture. The robot vision system is able to localise multiple instances of the same object class, while simultaneously maintaining vergence and directing its gaze to attend and recognise objects within cluttered, complex scenes. This is achieved by implementing all image analysis in an egocentric symbolic space without creating explicit pixel-space maps and without the need for calibration or other knowledge of the camera geometry. One of the important aspects of the active binocular vision paradigm requires that visual features in both camera eyes must be bound together in order to drive visual search to saccade, locate and recognise putative objects or salient locations in the robot’s field of view. The system structure is based on the “attentional spotlight” metaphor of biological systems and a collection of abstract and reactive visual behaviours arranged in a hierarchical structure. Several studies have shown that the human brain represents and learns objects for recognition by snapshots of 2-dimensional views of the imaged scene that happens to contain the object of interest during active interaction (exploration) of the environment. Likewise, psychophysical findings specify that the primate’s visual cortex represents common everyday objects by a hierarchical structure of their parts or sub-features and, consequently, recognise by simple but imperfect 2D view object part approximations. This thesis incorporates the above observations into an active visual learning behaviour in the hierarchical active binocular robot vision architecture. By actively exploring the object viewing sphere (as higher mammals do), the robot vision system automatically synthesises and creates its own part-based object representation from multiple observations while a human teacher indicates the object and supplies a classification name. Its is proposed to adopt the computational concepts of a visual learning exploration mechanism that controls the accumulation of visual evidence and directs attention towards the spatial salient object parts. The behavioural structure of the binocular robot vision architecture is loosely modelled by a WHAT and WHERE visual streams. The WHERE stream maintains and binds spatial attention on the object part coordinates that egocentrically characterises the location of the object of interest and extracts spatio-temporal properties of feature coordinates and descriptors. The WHAT stream either determines the identity of an object or triggers a learning behaviour that stores view-invariant feature descriptions of the object part. Therefore, the robot vision is capable to perform a collection of different specific visual tasks such as vergence, detection, discrimination, recognition localisation and multiple same-instance identification. This classification of tasks enables the robot vision system to execute and fulfil specified high-level tasks, e.g. autonomous scene exploration and active object appearance learning.

[1]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[2]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[3]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[4]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[5]  Paul Newman,et al.  Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers , 2009, Int. J. Robotics Res..

[6]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[7]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[8]  Jan-Olof Eklundh,et al.  Vision in the real world: Finding, attending and recognizing objects , 2006, Int. J. Imaging Syst. Technol..

[9]  A. Berthoz,et al.  From brainstem to cortex: Computational models of saccade generation circuitry , 2005, Progress in Neurobiology.

[10]  Rodney A. Brooks,et al.  How to Build Complete Creatures Rather than Isolated Cognitive Simulators , 2014 .

[11]  S. J. Marshall,et al.  Human body 3D imaging by speckle texture projection photogrammetry , 2000 .

[12]  Gertjan J. Burghouts,et al.  Material-specific adaptation of color invariant features , 2009, Pattern Recognit. Lett..

[13]  E. R. Davies,et al.  Machine vision - theory, algorithms, practicalities , 2004 .

[14]  Sanja Fidler,et al.  Learning Hierarchical Representations of Object Categories for Robot Vision , 2007, ISRR.

[15]  Danica Kragic,et al.  Vision for robotic object manipulation in domestic settings , 2005, Robotics Auton. Syst..

[16]  David W. Murray,et al.  Simultaneous Localization and Map-Building Using Active Vision , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  B E Stein,et al.  Nociceptive neurons in rat superior colliculus: response properties, topography, and functional implications. , 1989, Journal of neurophysiology.

[18]  Jan Paul Siebert,et al.  Towards binocular active vision in a robot head system , 2008 .

[19]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[20]  T. Duckett,et al.  VOCUS : A Visual Attention System for Object Detection and Goal-directed Search , 2010 .

[21]  Jan-Olof Eklundh,et al.  Attending, Foveating and Recognizing Objects in Real World Scenes , 2004 .

[22]  Giorgio Bonmassar,et al.  Space-variant active vision: Definition, overview and examples , 1995, Neural Networks.

[23]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[26]  R. Berman,et al.  Attention and active vision , 2009, Vision Research.

[27]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views , 2006, International Journal of Computer Vision.

[28]  J. Paul Siebert,et al.  Towards a unified visual framework in a binocular active robot vision system , 2010, Robotics Auton. Syst..

[29]  A.C. Bovik,et al.  Foveated Object Recognition Using Corners , 2008, 2008 IEEE Southwest Symposium on Image Analysis and Interpretation.

[30]  J Theeuwes,et al.  Visual selective attention: a theoretical analysis. , 1993, Acta psychologica.

[31]  Heinz Hügli,et al.  Empirical Validation of the Saliency-based Model of Visual Attention , 2003 .

[32]  Christophe Rosenberger,et al.  Unsupervised clustering method with optimal estimation of the number of clusters: application to image segmentation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[33]  J. Findlay,et al.  Active Vision: The Psychology of Looking and Seeing , 2003 .

[34]  Jochen Triesch,et al.  Semi-autonomous Learning of Objects , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[35]  Mahmood R. Azimi-Sadjadi,et al.  Unsupervised Clustering in Hough Space for Identification of Partially Occluded Objects , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Jan-Olof Eklundh,et al.  Foveated Figure-Ground Segmentation and Its Role in Recognition , 2005, BMVC.

[37]  Yoshinori Kobayashi,et al.  An Integrated Method for Multiple Object Detection and Localization , 2008, ISVC.

[38]  Jan Paul Siebert,et al.  SIFT keypoint descriptors for range image analysis , 2008 .

[39]  Sumitha L. Balasuriya A computational model of space-variant vision based on a self-organised artificial retina tessellation , 2006 .

[40]  Heinrich H. Bülthoff,et al.  View-based dynamic object recognition based on human perception , 2002, Object recognition supported by user interaction for service robots.

[41]  Heinrich H. Bülthoff,et al.  Object Recognition in Humans and Machines , 2007 .

[42]  Ulrich W. Eisenecker,et al.  AI: The Tumultuous History of the Search for Artificial Intelligence , 1995 .

[43]  Andrew E. Johnson,et al.  Machine vision for autonomous small body navigation , 2000, 2000 IEEE Aerospace Conference. Proceedings (Cat. No.00TH8484).

[44]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[45]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[46]  Timothy A. Boyling Active vision for autonomous 3D scene reconstruction , 2002 .

[47]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[48]  M J Tarr,et al.  What Object Attributes Determine Canonical Views? , 1999, Perception.

[49]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Demetrios Betsis,et al.  Kinematic Calibration of the Kth Head-eye System Kinematic Calibration of the Kth Head-eye System , 1994 .

[51]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[52]  J. Paul Siebert,et al.  Unsupervised clustering in Hough space for recognition of multiple instances of the same object in a cluttered scene , 2010, Pattern Recognit. Lett..

[53]  Jan Paul Siebert,et al.  A fast foveated stereo matcher , 2000 .

[54]  Giulio Sandini,et al.  A space-variant approach to oculomotor control , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[55]  Danica Kragic,et al.  An Active Vision System for Detecting, Fixating and Manipulating Objects in the Real World , 2010, Int. J. Robotics Res..

[56]  B. Cyganek An Introduction to 3D Computer Vision Techniques and Algorithms , 2009 .

[57]  Robin R. Murphy,et al.  Introduction to AI Robotics , 2000 .

[58]  Per-Erik Forssén Learning Saccadic Gaze Control via Motion Prediciton , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[59]  Daphna Weinshall,et al.  A self-organizing multiple-view representation of 3D objects , 2004, Biological Cybernetics.

[60]  Alexandre Bernardino,et al.  Visual behaviours for binocular tracking , 1998, Robotics Auton. Syst..

[61]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[62]  Narendra Ahuja,et al.  Learning recognition and segmentation of 3-D objects from 2-D images , 1993, 1993 (4th) International Conference on Computer Vision.

[63]  Sven Behnke,et al.  A Hierarchy of Reactive Behaviors Handles Complexity , 2000, Balancing Reactivity and Social Deliberation in Multi-Agent Systems.

[64]  Manuela M. Veloso,et al.  Detection and Localization of Multiple Objects , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[65]  Simone Frintrop,et al.  Visual Attention for Object Recognition in Spatial 3D Data , 2004, WAPCV.

[66]  Giulio Sandini,et al.  Object-based Visual Attention: a Model for a Behaving Robot , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[67]  Jan Paul Siebert,et al.  A Hierarchy of Visual Behaviours in an Active Binocular Robot Head , 2009 .

[68]  M. Posner,et al.  Components of visual orienting , 1984 .

[69]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[70]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Carl-Johan Westelius,et al.  Focus of attention and gaze control for robot vision , 1995 .

[72]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[73]  Günther Palm,et al.  Combining Visual Attention, Object Recognition and Associative Information Processing in a NeuroBotic System , 2005, Biomimetic Neural Learning for Intelligent Robots.

[74]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[75]  John M. Findlay,et al.  Visual Attention: The Active Vision Perspective , 2001 .

[76]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[77]  Peter J. Burt,et al.  Attention mechanisms for vision in a dynamic world , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[78]  Bernt Schiele,et al.  Multiple Object Class Detection with a Generative Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[79]  Bryan F. J. Manly,et al.  Multivariate Statistical Methods : A Primer , 1986 .

[80]  Bärbel Mertsching,et al.  Evaluation of Visual Attention Models for Robots , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[81]  Alexei A. Efros,et al.  Detection of Multiple Deformable Objects using PCA-SIFT , 2007, AAAI.

[82]  Guido Sanguinetti,et al.  Dimensionality Reduction of Clustered Data Sets , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[84]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[85]  Heinrich H. Bülthoff,et al.  Automatic acquisition of exemplar-based representations for recognition from image sequences , 2001, CVPR 2001.

[86]  Wilsaan M. Joiner,et al.  Neuronal mechanisms for visual stability: progress and problems , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[87]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[88]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[89]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[90]  Giulio Sandini,et al.  Development of auditory-evoked reflexes: Visuo-acoustic cues integration in a binocular head , 2002, Robotics Auton. Syst..

[91]  Heinrich H. Bülthoff Object Recognition in Man and Machine , 2004 .

[92]  Sandor M. Veres Natural Language Programming of Agents and Robotic Devices , 2008 .

[93]  Robin R. Murphy,et al.  Sfx: An Architecture For Action-oriented Sensor Fusion , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[94]  George K. I. Mann,et al.  An Object-Based Visual Attention Model for Robotic Applications , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[95]  S. Grossberg,et al.  View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds , 2009, Cognitive Psychology.

[96]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[97]  Benjamin Kuipers,et al.  The initial development of object knowledge by a learning robot , 2008, Robotics Auton. Syst..

[98]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[99]  C. Baker,et al.  The neural basis of visual object learning , 2010, Trends in Cognitive Sciences.

[100]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[101]  Giovanni M. Bianco,et al.  The turn-back-and-look behaviour: bee versus robot , 2000, Biological Cybernetics.

[102]  Li Dong,et al.  HOG based multi-stage object detection and pose recognition for service robot , 2010, 2010 11th International Conference on Control Automation Robotics & Vision.

[103]  J. Tsotsos What roles can attention play in recognition? , 2008, 2008 7th IEEE International Conference on Development and Learning.

[104]  Tieniu Tan,et al.  Mobile robot self-localization based on global visual appearance features , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[105]  Jan-Olof Eklundh,et al.  Recognition of Objects in the Real World from a Systems Perspective , 2005, Künstliche Intell..

[106]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[107]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[108]  S. Ullman Object recognition and segmentation by a fragment-based hierarchy , 2007, Trends in Cognitive Sciences.

[109]  Jan Paul Siebert,et al.  Smoothing disparity maps using intensity-edge guided anisotropic diffusion , 2008 .

[110]  Peter Mowforth,et al.  A head called Richard , 1990, BMVC.

[111]  Barbara Caputo,et al.  Recognition with local features: the kernel recipe , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[112]  Mohamed A. Ismail,et al.  Matching Occluded Objects Invariant to Rotations, Translations, Reflections, and Scale Changes , 2003, SCIA.

[113]  Paolo Pirjanian,et al.  Structure from stereo vision using unsynchronized cameras for simultaneous localization and mapping , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[114]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[115]  H H Bülthoff,et al.  Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[116]  Jason M. Scimeca,et al.  Tracking Multiple Objects Is Limited Only by Object Spacing, Not by Speed, Time, or Capacity , 2010, Psychological science.

[117]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[118]  C. W. Urquhart,et al.  The Active Stereo Probe: The Design and Implementation of an Active Videometrics System , 1997 .

[119]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[120]  I. THE ATTENTION SYSTEM OF THE HUMAN BRAIN , 2002 .

[121]  J. Paul Siebert,et al.  Local feature extraction and matching on range images: 2.5D SIFT , 2009, Comput. Vis. Image Underst..

[122]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[123]  D. Coombs Real-Time Gaze Holding in Binocular Robot Vision , 1992 .

[124]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[125]  Robin R. Murphy,et al.  Lessons learned in integrating sensing into autonomous mobile robot architectures , 1997, J. Exp. Theor. Artif. Intell..

[126]  Bärbel Mertsching,et al.  Visual Search in Static and Dynamic Scenes Using Fine-Grain Top-Down Visual Attention , 2008, ICVS.

[127]  Giulio Sandini,et al.  Visuo-inertial stabilization in space-variant binocular systems , 2000, Robotics Auton. Syst..

[128]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[129]  Jeremy M Wolfe,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[130]  Kelly Shen,et al.  Investigating the role of the superior colliculus in active vision with the visual search paradigm , 2011, The European journal of neuroscience.

[131]  James J. Little,et al.  Curious George: An attentive semantic robot , 2008, Robotics Auton. Syst..

[132]  Elizabeth A. Styles Attention, Perception and Memory: An Integrated Introduction , 2004 .

[133]  U. Neisser Cognitive Psychology. (Book Reviews: Cognition and Reality. Principles and Implications of Cognitive Psychology) , 1976 .