A Computational Learning Theory of Active Object Recognition Under Uncertainty

We present some theoretical results related to the problem of actively searching a 3D scene to determine the positions of one or more pre-specified objects. We investigate the effects that input noise, occlusion, and the VC-dimensions of the related representation classes have in terms of localizing all objects present in the search region, under finite computational resources and a search cost constraint. We present a number of bounds relating the noise-rate of low level feature detection to the VC-dimension of an object representable by an architecture satisfying the given computational constraints. We prove that under certain conditions, the corresponding classes of object localization and recognition problems are efficiently learnable in the presence of noise and under a purposive learning strategy, as there exists a polynomial upper bound on the minimum number of examples necessary to correctly localize the targets under the given models of uncertainty. We also use these arguments to show that passive approaches to the same problem do not necessarily guarantee that the problem is efficiently learnable. Under this formulation, we prove the existence of a number of emergent relations between the object detection noise-rate, the scene representation length, the object class complexity, and the representation class complexity, which demonstrate that selective attention is not only necessary due to computational complexity constraints, but it is also necessary as a noise-suppression mechanism and as a mechanism for efficient object class learning. These results concretely demonstrate the advantages of active, purposive and attentive approaches for solving complex vision problems.

[1]  John K. Tsotsos,et al.  A theory of active object localization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Ramakant Nevatia,et al.  Description and Recognition of Curved Objects , 1977, Artif. Intell..

[3]  D. Pelli,et al.  The information capacity of visual attention , 1992, Vision Research.

[4]  Shai Ben-David,et al.  Localization vs. Identification of Semi-Algebraic Sets , 1993, COLT '93.

[5]  Bernt Schiele,et al.  Transinformation for active object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[7]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[8]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[9]  Hanspeter A. Mallot,et al.  Saccadic object recognition with an active vision system , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[10]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[11]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[12]  John K. Tsotsos,et al.  On Sensor Bias in Experimental Methods for Comparing Interest-Point, Saliency, and Recognition Algorithms , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[14]  Takeo Kanade,et al.  Automatic generation of object recognition programs , 1988, Proc. IEEE.

[15]  Subhashis Banerjee,et al.  Isolated 3D object recognition through next view planning , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[16]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[17]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[18]  Rodney A. Brooks,et al.  The ACRONYM Model-Based Vision System , 1979, IJCAI.

[19]  Frank P. Ferrie,et al.  Active Object Recognition: Looking for Differences , 2001, International Journal of Computer Vision.

[20]  John K. Tsotsos,et al.  Active Vision for Door Localization and Door Opening using Playbot: A Computer Controlled Wheelchair for People with Mobility Impairments , 2008, 2008 Canadian Conference on Computer and Robot Vision.

[21]  John K. Tsotsos,et al.  Attending to visual motion , 2005, Comput. Vis. Image Underst..

[22]  Bir Bhanu,et al.  Predicting Performance of Object Recognition , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[24]  Yiming Ye,et al.  A Complexity‐Level Analysis of the Sensor Planning Task for Object Search , 2001, Comput. Intell..

[25]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[26]  Heiko Wersing,et al.  Active 3D Object Localization Using a Humanoid Robot , 2011, IEEE Transactions on Robotics.

[27]  Lambert E. Wixson,et al.  Using intermediate objects to improve the efficiency of visual search , 1994, International Journal of Computer Vision.

[28]  Sven J. Dickinson,et al.  Active Object Recognition Integrating Attention and Viewpoint Control , 1997, Comput. Vis. Image Underst..

[29]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[30]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[31]  Sven J. Dickinson,et al.  A Computational Model of View Degeneracy , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  J. Deutsch Perception and Communication , 1958, Nature.

[33]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[34]  Ruzena Bajcsy,et al.  Occlusions as a Guide for Planning the Next View , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  James J. Little,et al.  Curious George: An attentive semantic robot , 2008, Robotics Auton. Syst..

[36]  Danica Kragic,et al.  Integrating Active Mobile Robot Object Recognition and SLAM in Natural Environments , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[38]  S. Grossberg Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .

[39]  M. Seeger The Proof of McAllester ’ s PAC-Bayesian Theorem , 2002 .

[40]  W. Eric L. Grimson The Combinatorics of Heuristic Search Termination for Object Recognition in Cluttered Environments , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[42]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[43]  Barbara Anne Dosher,et al.  Task precision at transfer determines specificity of perceptual learning. , 2009, Journal of vision.

[44]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[45]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[46]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[47]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[48]  Eli Brenner,et al.  Reliable Identification by Color under Natural Conditions the Locations Baseline Measurement , 2022 .

[49]  Tal Arbel,et al.  Efficient Discriminant Viewpoint Selection for Active Bayesian Recognition , 2006, International Journal of Computer Vision.

[50]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[51]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.

[52]  Yiming Ye,et al.  Sensor Planning for 3D Object Search , 1999 .

[53]  L. Valiant Deductive learning , 1984, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.

[54]  Geoffrey E. Hinton Relaxation and its role in vision , 1977 .

[55]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[56]  F. Brentano Psychologie vom empirischen Standpunkt , 1925 .

[57]  Michael Lindenbaum,et al.  An Integrated Model for Evaluating the Amount of Data Required for Reliable Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[59]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[60]  John K. Tsotsos On the relative complexity of active vs. passive visual search , 2004, International Journal of Computer Vision.

[61]  John K. Tsotsos A Computational Perspective on Visual Attention , 2011 .

[62]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[63]  Olivier Stasse,et al.  Online object search with a humanoid robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  P. Subramanian Active Vision: The Psychology of Looking and Seeing , 2006 .

[65]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[66]  Wulfram Gerstner,et al.  Spiking Neuron Models: An Introduction , 2002 .

[67]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .