Unsupervised visual alignment with similarity graphs

Alignment of semantically meaningful visual patterns, such as object classes, is an important pre-processing step for a number of applications such as object detection and image categorization. Considering the expensive manpower spent on the annotation for supervised alignment methods, unsupervised alignment techniques are more favorable especially for large-scale problems. Fine adjustment can be effectively and efficiently achieved with image congealing methods, but they require moderately good initialization which is largely invalid in practice. Alignment of visual class examples with large view point changes remains as an open problem. Feature-based methods can solve the problem to some degree, but require manual selection of a good seed image and omit the fact that examples of a semantic class can be visually very different (e.g., Harley-Davidsons and Scooters in “motorbikes”). In this work, we overcome the aforementioned drawbacks by defining visual similarity under the generalized assignment problem which is solved by fast approximation and non-linear optimization. From pair-wise image similarities we construct an image graph which is used to step-wise align, “morph”, an image to another by graph traveling. We automatically find a suitable seed by novel centrality measure which identifies “similarity hubs” in the graph. The proposed approach in the unsupervised manner outperforms the state-of-the-art methods with classes from the popular benchmark datasets.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ke Chen,et al.  Learning Generative Models of Object Parts from a Few Positive Examples , 2014, 2014 22nd International Conference on Pattern Recognition.

[3]  Joni-Kristian Kämäräinen,et al.  Local Feature Based Unsupervised Alignment of Object Class Images , 2011, BMVC.

[4]  Florent Brunet,et al.  Feature-Driven Direct Non-Rigid Image Registration , 2010, International Journal of Computer Vision.

[5]  Andrew Zisserman,et al.  Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets , 2011, International Journal of Computer Vision.

[6]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[7]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[8]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[9]  Heesoo Myeong,et al.  Learning object relationships via graph-based context model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ira Kemelmacher-Shlizerman,et al.  Collection flow , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[12]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Fei-Fei Li,et al.  Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going? , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[15]  John Wright,et al.  RASL: Robust Alignment by Sparse and Low-Rank Decomposition for Linearly Correlated Images , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Honglak Lee,et al.  Learning to Align from Scratch , 2012, NIPS.

[18]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[19]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[20]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[21]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22]  Reuven Cohen,et al.  An efficient approximation for the Generalized Assignment Problem , 2006, Inf. Process. Lett..

[23]  Iasonas Kokkinos,et al.  Unsupervised Learning of Object Deformation Models , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Eric P. Xing,et al.  Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ahmed M. Elgammal,et al.  One-shot multi-set non-rigid feature-spatial matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Edwin R. Hancock,et al.  Incrementally Discovering Object Classes Using Similarity Propagation and Graph Clustering , 2009, ACCV.

[28]  Joni-Kristian Kämäräinen,et al.  Making Visual Object Categorization More Challenging: Randomized Caltech-101 Data Set , 2010, 2010 20th International Conference on Pattern Recognition.

[29]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[30]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[31]  Sridha Sridharan,et al.  Least-squares congealing for large numbers of images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[35]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[36]  Michal Irani,et al.  Detecting and sketching the common , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[38]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[41]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Stefano Soatto,et al.  A Complexity-Distortion Approach to Joint Pattern Alignment , 2006, NIPS.

[43]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Timothy F. Cootes,et al.  Computing Accurate Correspondences across Groups of Images , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[46]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[47]  Heiko Rieger,et al.  Random walks on complex networks. , 2004, Physical review letters.