Local Feature Based Unsupervised Alignment of Object Class Images

Alignment of objects is a predominant problem in visual object categorisation (VOC). State-of-the-art part-based VOC methods try to automatically learn object parts and their spatial variation, which is difficult for objects in arbitrary poses. A straightforward solution is to annotate images with a set of “object landmarks”, but due to laborious work required, less supervised methods are preferred. Effective semi-supervised VOC methods have been introduced, but none of them explicitly define an alignment procedure or study its effect to overall VOC performance. Unsupervised alignment has been recognised as its own problem referred to as “spatial image congealing” and a number of congealing methods have been proposed. These methods are mainly seminal work to Learned-Miller [3, 4] extending and improving the original algorithm. The main drawback of the congealing methods is that they are iterative optimisation methods operating on pixel-level and thus require at least moderate initial alignment to converge. Our approach [2] deviates from the congealing works by the fact that we utilise local features instead of pixel level processing, i.e. featurebased congealing. Our solution is more similar to those used in the partbased VOC methods, but we explicitly define the alignment algorithm and measure its performance.

[1]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Long Zhu,et al.  Unsupervised Learning of Probabilistic Object Models (POMs) for Object Classification, Segmentation, and Recognition Using Knowledge Propagation , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[4]  Iasonas Kokkinos,et al.  Unsupervised Learning of Object Deformation Models , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Brendan J. Frey,et al.  Transformed component analysis: joint estimation of spatial transformations and image components , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Narendra Ahuja,et al.  Unsupervised Category Modeling, Recognition, and Segmentation in Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[9]  C. Schmid,et al.  Learning shape prior models for object matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[11]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Brendan J. Frey,et al.  Transformation-Invariant Clustering and Dimensionality Reduction Using EM , 2001 .

[13]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Bernt Schiele,et al.  Local features for object class recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[18]  Joni-Kristian Kämäräinen,et al.  Making Visual Object Categorization More Challenging: Randomized Caltech-101 Data Set , 2010, 2010 20th International Conference on Pattern Recognition.

[19]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[21]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[22]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[25]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[26]  Erik G. Learned-Miller,et al.  Data driven image models through continuous joint alignment , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Sridha Sridharan,et al.  Least-squares congealing for large numbers of images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[29]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[30]  Sridha Sridharan,et al.  Least squares congealing for unsupervised alignment of images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Stefano Soatto,et al.  A Complexity-Distortion Approach to Joint Pattern Alignment , 2006, NIPS.

[33]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.