Discovering Multi-relational Latent Attributes by Visual Similarity Networks

The key problems in visual object classification are: learning discriminative feature to distinguish between two or more visually similar categories (e.g. dogs and cats), modeling the variation of visual appearance within instances of the same class (e.g. Dalmatian and Chihuahua in the same category of dogs), and tolerate imaging distortion (3D pose). These account to within and between class variance in machine learning terminology, but in recent works these additional pieces of information, latent dependency, have been shown to be beneficial for the learning process. Latent attribute space was recently proposed and verified to capture the latent dependent correlation between classes. Attributes can be annotated manually, but more attempting is to extract them in an unsupervised manner. Clustering is one of the popular unsupervised approaches, and the recent literature introduces similarity measures that help to discover visual attributes by clustering. However, the latent attribute structure in real life is multi-relational, e.g. two different sport cars in different poses vs. a sport car and a family car in the same pose - what attribute can dominate similarity? Instead of clustering, a network (graph) containing multiple connections is a natural way to represent such multi-relational attributes between images. In the light of this, we introduce an unsupervised framework for network construction based on pairwise visual similarities and experimentally demonstrate that the constructed network can be used to automatically discover multiple discrete (e.g. sub-classes) and continuous (pose change) latent attributes. Illustrative examples with publicly benchmarking datasets can verify the effectiveness of capturing multi- relation between images in the unsupervised style by our proposed network.

[1]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Fei-Fei Li,et al.  Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[4]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[5]  Joni-Kristian Kämäräinen,et al.  Local Feature Based Unsupervised Alignment of Object Class Images , 2011, BMVC.

[6]  Andrew Zisserman,et al.  Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets , 2011, International Journal of Computer Vision.

[7]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[11]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[12]  Shaogang Gong,et al.  Constructing Robust Affinity Graphs for Spectral Clustering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Efficient discriminative learning of parts-based models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Stefan Carlsson,et al.  Mixture Component Identification and Learning for Visual Recognition , 2012, ECCV.

[18]  Jian Dong,et al.  Subcategory-Aware Object Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[21]  Joni-Kristian Kämäräinen,et al.  Unsupervised object discovery via self-organisation , 2012, Pattern Recognit. Lett..

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Edwin R. Hancock,et al.  Incrementally Discovering Object Classes Using Similarity Propagation and Graph Clustering , 2009, ACCV.

[24]  Heesoo Myeong,et al.  Learning object relationships via graph-based context model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[26]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[27]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[29]  Alexei A. Efros,et al.  Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships , 2009, NIPS.

[30]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.