Active Object Recognition Integrating Attention and Viewpoint Control

We present an active object recognition strategy which combines the use of an attention mechanism for focusing the search for a 3D object in a 2D image, with a viewpoint control strategy for disambiguating recovered object features. The attention mechanism consists of a probabilistic search through a hierarchy of predicted feature observations, taking objects into a set of regions classified according to the shapes of their bounding contours. We motivate the use of image regions as a focus-feature and compare their uncertainty in inferring objects with the uncertainty of more commonly used features such as lines or corners. If the features recovered during the attention phase do not provide a unique mapping to the 3D object being searched, the probabilistic feature hierarchy can be used to guide the camera to a new viewpoint from where the object can be disambiguated. The power of the underlying representation is its ability to unify these object recognition behaviors within a single framework. We present the approach in detail and evaluate its performance in the context of a project providing robotic aids for the disabled.

[1]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[2]  W. Grimson,et al.  Model-Based Recognition and Localization from Sparse Range or Tactile Data , 1984 .

[3]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[4]  Richard A. Volz,et al.  Object recognition using multiple views , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[5]  I. Biederman Human image understanding: Recent research and a theory , 1985, Computer Vision Graphics and Image Processing.

[6]  R. Haralick,et al.  Morphologic edge detection , 1986, IEEE J. Robotics Autom..

[7]  Alex Pentland,et al.  Perceptual Organization and the Representation of Natural Form , 1986, Artif. Intell..

[8]  John K. Tsotsos Representational axes and temporal cooperative processes , 1987 .

[9]  Yehezkel Lamdan,et al.  On recognition of 3-D objects from 2-D images , 2011, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[10]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[11]  Avinash C. Kak,et al.  Planning sensing strategies in a robot work cell with multi-sensor capabilities , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[12]  Robert Bergevin,et al.  Generic object recognition: building coarse 3D descriptions from line drawings , 1989, [1989] Proceedings. Workshop on Interpretation of 3D Scenes.

[13]  Tod S. Levitt,et al.  Model-Based Influence Diagrams for Machine Vision , 1989, UAI.

[14]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[15]  Andrea Califano,et al.  Data and model driven foveation , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[16]  M. Arbib,et al.  Vision, brain, and cooperative computation , 1990 .

[17]  Ruzena Bajcsy,et al.  How to Decide From the First View Where to Look Next , 1990 .

[18]  Sven J. Dickinson,et al.  A Representation for Qualitative 3-D Object Recognition Integrating Object-Centered and Viewer-Centered Models , 1990 .

[19]  R. C. Fairwood,et al.  Recognition of generic components using logic-program relations of image contours , 1991, Image Vis. Comput..

[20]  L. Stark,et al.  Dissertation Abstract , 1994, Journal of Cognitive Education and Psychology.

[21]  Philippe Saint-Marc,et al.  Adaptive Smoothing: A General Tool for Early Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Alain Jacot-Descombes,et al.  Probabilistic approach to 3-D inference of geons from a 2-D view , 1992, Defense, Security, and Sensing.

[23]  John K. Tsotsos,et al.  An Attentional Prototype for Early Vision , 1992, ECCV.

[24]  Kjell Brunnström,et al.  Active Detection and Classsification of Junctions by Foveation with a Head-Eye System Guided by the Scale-Space Primal Sketch , 1992, ECCV.

[25]  Sven J. Dickinson,et al.  Unified approach to the recognition of expected and unexpected geon-based objects , 1992, Defense, Security, and Sensing.

[26]  John K. Tsotsos,et al.  Active object recognition , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Azriel Rosenfeld,et al.  3-D Shape Recovery Using Distributed Aspect Matching , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[29]  John K. Tsotsos,et al.  A prototype for data-driven visual attention , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[30]  Christopher M. Brown,et al.  Where to Look Next Using a Bayes Net: Incorporating Geometric Relations , 1992, ECCV.

[31]  Mengxiang Li Minimum Description Length Based 2d Shape Description Minimum Description Length Based 2d Shape Description , 1992 .

[32]  Henrik I. Christensen,et al.  Bayesian methods for interpretation and control in multiagent vision systems , 1992, Defense, Security, and Sensing.

[33]  Azriel Rosenfeld,et al.  From volumes to views: An approach to 3-D object recognition , 1992, CVGIP Image Underst..

[34]  Anil K. Jain,et al.  Recognizing geons from superquadrics fitted to range data , 1992, Image Vis. Comput..

[35]  Sven J. Dickinson,et al.  The Use of Geons for Generic 3D Object Recognition , 1993, IJCAI.

[36]  Sven J. Dickinson,et al.  Integration of quantitative and qualitative techniques for deformable model fitting from orthographic, perspective, and stereo projections , 1993, 1993 (4th) International Conference on Computer Vision.

[37]  Michael Jenkin,et al.  Spatial vision in humans and robots , 1994 .

[38]  Sven J. Dickinson,et al.  Qualitative tracking of 3-D objects using active contour networks , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[39]  John K. Tsotsos An inhibitory beam for attentional selection , 1994 .

[40]  James L. Crowley,et al.  Vision as Process , 1995 .

[41]  Sven J. Dickinson,et al.  A quantitative analysis of view degeneracy and its use for active focal length control , 1995, Proceedings of IEEE International Conference on Computer Vision.

[42]  Jiri Matas,et al.  Control of Scene Interpretation , 1995 .

[43]  David G. Lowe,et al.  Perceptual Organization and Visual Recognition , 2012 .