Shape Matching and Object Recognition

We address comparing related, but not identical shapes in images following a deformable template strategy. At the heart of this is the notion of an alignment between the shapes to be matched. The transformation necessary for alignment and the remaining differences after alignment are then used to make a comparison. A model determines what kind of deformations or alignments are acceptable, and what variation in appearance should remain after alignment. This ties strongly with the idea that the difference in shape is the residual difference, after some family of transformations has been applied for alignment. Finding an alignment of a model to a novel object involves search through the space of possible alignments. In many settings this search is quite difficult. This work shows that the search can be approximated by an easier discrete matching problem between key points on a model and a novel object. This is a departure from traditional approaches to deformable template matching that concentrate on analyzing differential models. This thesis presents theories and experiments on searching for, identifying, and using alignments found via discrete matchings. In particular we present a mathematical and ecological motivation for a medium scale descriptor of shape, geometric blur. Geometric blur is an average over transformations of a sparse signal or feature channel, and can be computed using a spatially varying convolution. The resulting shape descriptors are useful for evaluating local shape similarity. Experiments demonstrate their efficacy for image classification and shape correspondence. Finding alignments between shapes is formulated as an optimization problem over discrete matchings between feature points in images. Similarity between putative correspondences is measured using geometric blur, and the deformation in the configuration of points is measured by summing over deformations in pairwise relationships. The snatching problem is formulated as an integer quadratic programming problem and approximated with a simple technique. Experimental results indicate that this generic model of local shape and deformation is applicable across a wide variety of object categories, providing good (currently the best known) performance for object recognition and localization on a difficult object recognition benchmark. Furthermore this generic object alignment strategy can be used to model variation in images of an object category, identifying the repeated object structures and providing automatic localization of the objects.

[1]  D'arcy W. Thompson On Growth and Form , 1945 .

[2]  W. Peddie Phenomenal Regression to the Real Object , 1933, Nature.

[3]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[4]  E. Switkes,et al.  Deoxyglucose analysis of retinotopic organization in primate striate cortex. , 1982, Science.

[5]  L N Piotrowski,et al.  A Demonstration of the Visual Importance and Flexibility of Spatial-Frequency Amplitude and Phase , 1982, Perception.

[6]  A. Slater,et al.  Shape Constancy and Slant Perception at Birth , 1985, Perception.

[7]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[8]  D. Burr,et al.  Feature detection in human vision: a phase-dependent energy model , 1988, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[9]  Yehezkel Lamdan,et al.  Affine invariant model-based object recognition , 1990, IEEE Trans. Robotics Autom..

[10]  Ulf Grenander,et al.  Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[11]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[12]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Michael C. Burl,et al.  Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[14]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Yali Amit,et al.  Joint Induction of Shape Features and Tree Classifiers , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[21]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[22]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Eric Mjolsness,et al.  A relationship between spline-based deformable models and weighted graphs in non-rigid matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Krystian Mikolajczyk,et al.  Detection of local features invariant to affines transformations , 2002 .

[25]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[26]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[27]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  Anand Rangarajan,et al.  A new point matching algorithm for non-rigid registration , 2003, Comput. Vis. Image Underst..

[30]  João Paulo Costeira,et al.  A Global Solution to Sparse Correspondence Problems , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[32]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  A. Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[35]  Tamara L. Berg,et al.  Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[36]  H. Schneiderman Feature-centric evaluation for efficient cascaded object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[37]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[38]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[39]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Laurence Commissaire divisionnaire,et al.  Mid-level Cues Improve Boundary Detection , 2005 .

[42]  Jitendra Malik,et al.  Efficient shape matching using shape contexts , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Pietro Perona,et al.  Combining generative models and Fisher kernels for object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.