Unsupervised Representation Learning by Discovering Reliable Image Relations

Learning robust representations that allow to reliably establish relations between images is of paramount importance for virtually all of computer vision. Annotating the quadratic number of pairwise relations between training images is simply not feasible, while unsupervised inference is prone to noise, thus leaving the vast majority of these relations to be unreliable. To nevertheless find those relations which can be reliably utilized for learning, we follow a divide-and-conquer strategy: We find reliable similarities by extracting compact groups of images and reliable dissimilarities by partitioning these groups into subsets, converting the complicated overall problem into few reliable local subproblems. For each of the subsets we obtain a representation by learning a mapping to a target feature space so that their reliable relations are kept. Transitivity relations between the subsets are then exploited to consolidate the local solutions into a concerted global representation. While iterating between grouping, partitioning, and learning, we can successively use more and more reliable relations which, in turn, improves our image representation. In experiments, our approach shows state-of-the-art performance on unsupervised classification on ImageNet with 46.0% and competes favorably on different transfer learning tasks on PASCAL VOC.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Jianping Fan,et al.  Learning multi-layer coarse-to-fine representations for large-scale image classification , 2019, Pattern Recognit..

[3]  Paolo Favaro,et al.  Representation Learning by Learning to Count , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[7]  Nicolas Passat,et al.  Binary Partition Tree construction from multiple features for image segmentation , 2018, Pattern Recognit..

[8]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[9]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Björn Ommer,et al.  Deep unsupervised learning of visual similarities , 2018, Pattern Recognit..

[11]  Björn Ommer,et al.  Generative regularization with latent topics for discriminative object recognition , 2015, Pattern Recognit..

[12]  Yongqiang Zhang,et al.  Weakly-supervised object detection via mining pseudo ground truth bounding-boxes , 2018, Pattern Recognit..

[13]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[15]  Wei Xiong,et al.  Combining local and global: Rich and robust feature pooling for visual recognition , 2017, Pattern Recognit..

[16]  Pengfei Ge,et al.  Deep metric learning via subtype fuzzy clustering , 2019, Pattern Recognit..

[17]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[18]  Xiaoyang Tan,et al.  Unsupervised feature learning with C-SVDDNet , 2014, Pattern Recognit..

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Trevor Darrell,et al.  Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[21]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[22]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[24]  Chen Huang,et al.  Unsupervised Learning of Discriminative Attributes and Visual Representations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Kenneth W. Shum,et al.  Deep Representation Learning with Target Coding , 2015, AAAI.

[27]  Huchuan Lu,et al.  Hyperfusion-Net: Hyper-densely reflective feature fusion for salient object detection , 2019, Pattern Recognit..

[28]  Armand Joulin,et al.  Unsupervised Learning by Predicting Noise , 2017, ICML.

[29]  Shengyong Chen,et al.  Supervised learning based discrete hashing for image retrieval , 2019, Pattern Recognit..

[30]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Björn Ommer,et al.  Unsupervised Video Understanding by Reconciliation of Posture Similarities , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Lingfeng Wang,et al.  Deep unsupervised learning with consistent inference of latent representations , 2017, Pattern Recognit..

[33]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[34]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[35]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[36]  Abhinav Gupta,et al.  Transitive Invariance for Self-Supervised Visual Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[38]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[39]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[40]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).