论文信息 - Feature-Driven Emergence of Model Graphs for Object Recognition and Categorization

Feature-Driven Emergence of Model Graphs for Object Recognition and Categorization

An important requirement for the expression of cognitive structures is the ability to form mental objects by rapidly binding together constituent parts. In this sense, one may conceive the brain's data structure to have the form of graphs whose nodes are labeled with elementary features. These provide a versatile data format with the additional ability to render the structure of any mental object. Because of the multitude of possible object variations the graphs are required to be dynamic. Upon presentation of an image a so-called model graph should rapidly emerge by binding together memorized subgraphs derived from earlier learning examples driven by the image features. In this model, the richness and flexibility of the mind is made possible by a combinatorical game of immense complexity. Consequently, the emergence of model graphs is a laborious task which, in computer vision, has most often been disregarded in favor of employing model graphs tailored to specific object categories like, for instance, faces in frontal pose. Recognition or categorization of arbitrary objects, however, demands dynamic graphs. In this work we propose a form of graph dynamics, which proceeds in two steps. In the first step component classifiers, which decide whether a feature is present in an image, are learned from training images. For processing arbitrary objects, features are small localized grid graphs, so-called parquet graphs, whose nodes are attributed with Gabor amplitudes. Through combination of these classifiers into a linear discriminant that conforms to Linsker's infomax principle a weighted majority voting scheme is implemented. It allows for preselection of salient learning examples, so-called model candidates, and likewise for preselection of categories the object in the presented image supposably belongs to. Each model candidate is verified in a second step using a variant of elastic graph matching, a standard correspondence-based technique for face and object recognition. To further differentiate between model candidates with similar features it is asserted that the features be in similar spatial arrangement for the model to be selected. Model graphs are constructed dynamically by assembling model features into larger graphs according to their spatial arrangement. From the viewpoint of pattern recognition, the presented technique is a combination of a discriminative (feature-based) and a generative (correspondence-based) classifier while the majority voting scheme implemented in the feature-based part is an extension of existing multiple feature subset methods. We report the results of experiments on standard databases for object recognition and categorization. The method achieved high recognition rates on identity, object category, pose, and illumination type. Unlike many other models the presented technique can also cope with varying background, multiple objects, and partial occlusion.

[1] Tomaso Poggio,et al. Models of object recognition , 2000, Nature Neuroscience.

[2] Michael J. Tarr,et al. How Experience Shapes Vision , 2006 .

[3] Horst Bunke,et al. A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Horst Bischof,et al. Object recognition using local information content , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5] Rolf P. Würtz,et al. Multilayer dynamic link networks for establishing image point correspondences and visual object recognition , 1995 .

[6] A. J. Mistlin,et al. Visual cells in the temporal cortex sensitive to face view and gaze direction , 1985, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7] Christoph von der Malsburg,et al. Maplets for correspondence-based object recognition , 2004, Neural Networks.

[8] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[9] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[10] M. F.,et al. Bibliography , 1985, Experimental Gerontology.

[11] Ralph Linsker,et al. Self-organization in a perceptual network , 1988, Computer.

[12] Hyeonjoon Moon,et al. The FERET evaluation methodology for face-recognition algorithms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Christoph von der Malsburg,et al. The What and Why of Binding The Modeler’s Perspective , 1999, Neuron.

[14] Shimon Ullman,et al. Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15] Norbert Krüger,et al. Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[16] D. O. Hebb,et al. The organization of behavior , 1988 .

[17] Hai Tao,et al. Object tracking with dynamic feature graph , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[18] Hiroshi Murase,et al. Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[19] A. Treisman,et al. A feature-integration theory of attention , 1980, Cognitive Psychology.

[20] Silviu Guiasu,et al. A quantitative-qualitative measure of information in cybernetic systems (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[21] Horst Bunke. Graph Grammars as a generative tool in image understanding , 1982, Graph-Grammars and Their Application to Computer Science.

[22] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[23] Edmund T. Rolls,et al. Invariant recognition of feature combinations in the visual system , 2002, Biological Cybernetics.

[24] Robert M. Haralick,et al. Structural Descriptions and Inexact Matching , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Ilkay Ulusoy,et al. Generative versus discriminative methods for object recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26] Heiko Wersing,et al. Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[27] Isabel Gauthier,et al. BOLD Activity during Mental Rotation and Viewpoint-Dependent Object Recognition , 2002, Neuron.

[28] LinLin Shen,et al. Face authentication test on the BANCA database , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29] Barbara Hammer,et al. Compositionality in Neural Systems , 2002 .

[30] E. Pfaffelhuber. Learning and information theory. , 1972, The International journal of neuroscience.

[31] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[32] Jean Piaget. Das Erwachen der Intelligenz beim Kinde , 1969 .

[33] Terence Sim,et al. The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[34] S. Thorpe,et al. Seeking Categories in the Brain , 2001, Science.

[35] Hyeonjoon Moon,et al. The FERET Evaluation Methodology for Face-Recognition Algorithms , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[36] I. Gauthier,et al. Visual object understanding , 2004, Nature Reviews Neuroscience.

[37] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[38] PROCEssIng magazInE. IEEE Signal Processing Magazine , 2004 .

[39] Christoph von der Malsburg,et al. Reconstruction from Graphs Labeled with Responses of Gabor Filters , 1996, ICANN.

[40] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[41] D. V. van Essen,et al. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[42] M. Tarr. Rotating objects to recognize them: A case study on the role of viewpoint dependency in the recognition of three-dimensional objects , 1995, Psychonomic bulletin & review.

[43] Ching Y. Suen,et al. Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[44] Bernd Fritzke,et al. A Self-Organizing Network that Can Follow Non-stationary Distributions , 1997, ICANN.

[45] Denis Fize,et al. Speed of processing in the human visual system , 1996, Nature.

[46] Ian Witten,et al. Data Mining , 2000 .

[47] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[48] Andreas H. J. Tewes. A flexible object model for encoding and matching human faces , 2006 .

[49] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[50] Christoph von der Malsburg,et al. Dynamic link architecture , 1998 .

[51] Takeo Kanade,et al. A statistical approach to 3d object detection applied to faces and cars , 2000 .

[52] S. Ullman. Aligning pictorial descriptions: An approach to object recognition , 1989, Cognition.

[53] Terence Sim,et al. The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[54] Rolf P. Würtz,et al. Object Recognition Robust Under Translations, Deformations, and Changes in Background , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[55] Rolf P. Würtz,et al. Fast object and pose recognition through minimum entropy coding , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[56] King-Sun Fu,et al. An Image Understanding System Using Attributed Symbolic Representation and Inexact Graph-Matching , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Luc Vandendorpe,et al. Face authentication test on the BANCA database , 2004, ICPR 2004.

[58] Laurenz Wiskott,et al. Labeled graphs and dynamic link matching for face recognition and scene analysis , 1995 .

[59] Shimon Edelman,et al. Representation, similarity, and the chorus of prototypes , 1993, Minds and Machines.

[60] Arnold W. M. Smeulders,et al. The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[61] Heinrich H. Bülthoff,et al. Psychophysical support for a 2D view interpolation theory of object recognition , 1991 .

[62] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[63] Horst Bischof,et al. Entropy based Saliency Maps for Object Recognition , 2004 .

[64] Michael A. Arbib,et al. The handbook of brain theory and neural networks , 1995, A Bradford book.

[65] R. G. Morris. D.O. Hebb: The Organization of Behavior, Wiley: New York; 1949 , 1999, Brain Research Bulletin.

[66] Sameer A. Nene,et al. Columbia Object Image Library (COIL100) , 1996 .

[67] J. P. Jones,et al. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[68] I. Biederman,et al. Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[69] Christoph von der Malsburg,et al. Pattern recognition by labeled graph matching , 1988, Neural Networks.

[70] Takayuki Ito,et al. Neocognitron: A neural network model for a mechanism of visual pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[71] Gabriele Peters,et al. A view based approach to three-dimensional object perception , 2002 .

[72] R. Gray,et al. Vector quantization , 1984, IEEE ASSP Magazine.

[73] Hartmut S. Loos. User assisted learning of visual object recognition , 2003 .

[74] Shimon Ullman,et al. Object Classification Using a Fragment-Based Representation , 2000, Biologically Motivated Computer Vision.

[75] Christoph von der Malsburg,et al. The Correlation Theory of Brain Function , 1994 .

[76] Ingo Wundrich. Parametrisierte zweidimensionale Modelle für dreidimensionale Gesichtserkennung , 2004 .

[77] Jan Wieghardt,et al. Learning the topology of views: from images to objects , 2001 .

[78] W. Pitts,et al. How we know universals; the perception of auditory and visual forms. , 1947, The Bulletin of mathematical biophysics.

[79] I. Biederman. Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[80] N. Logothetis,et al. Psychophysical and physiological evidence for viewer-centered object representations in the primate. , 1995, Cerebral cortex.

[81] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[82] Bernt Schiele,et al. Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[83] Joachim M. Buhmann,et al. Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[84] Pietro Perona,et al. Unsupervised Learning of Models for Recognition , 2000, ECCV.

[85] Bartlett W. Mel. SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[86] D. Marr,et al. Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.