Unsupervised Feature Learning by Cross-Level Discrimination between Instances and Groups

Unsupervised feature learning has made great strides with invariant mapping and instance-level discrimination, as benchmarked by classification on common datasets. However, these datasets are curated to be distinctive and class-balanced, whereas naturally collected data could be highly correlated within the class (with repeats at the extreme) and long-tail distributed across classes. The natural grouping of instances conflicts with the fundamental assumption of instance-level discrimination. Contrastive feature learning is thus unstable without grouping, whereas grouping without contrastive feature learning is easily trapped into degeneracy. We propose to integrate grouping into instance-level discrimination, not by imposing group-level discrimination, but by imposing cross-level discrimination between instances and groups. Our key insight is that grouping results from not just attraction, but also repulsion. While invariant mapping is achieved by local attraction between augmented instances, instance similarity emerges from long-range repulsion against common instance groups. To further avoid the clash between grouping and discrimination objectives, we also impose them on separate features derived from the common feature. Our extensive experimentation demonstrates not only significant gain on datasets with high correlation and long-tail distributions, but also leading performance on multiple self-supervision and semi-supervision benchmarks, bringing unsupervised feature learning closer to real data applications.

[1]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[2]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[3]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[4]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[6]  Paolo Favaro,et al.  Self-Supervised Feature Learning by Learning to Spot Artifacts , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Chen Change Loy,et al.  Online Deep Clustering for Unsupervised Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[11]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[12]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Stella X. Yu,et al.  SegSort: Segmentation by Discriminative Sorting of Segments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Stella X. Yu,et al.  Finding dots: Segmentation as popping out regions from boundaries , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Shih-Fu Chang,et al.  Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[20]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[21]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[22]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[23]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[24]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[25]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[26]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Taesup Kim,et al.  Fast AutoAugment , 2019, NeurIPS.

[28]  Jianbo Shi,et al.  Understanding popout through repulsion , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[29]  Jana Kosecka,et al.  Multiview RGB-D Dataset for Object Instance Detection , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[30]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2019 .

[31]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[32]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Ivor W. Tsang,et al.  Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering , 2011, IEEE Transactions on Neural Networks.

[35]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Dahua Lin,et al.  Self-Supervised Learning via Conditional Motion Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[38]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[39]  Jianbo Shi,et al.  Segmentation with Pairwise Attraction and Repulsion , 2001, ICCV.

[40]  Cordelia Schmid,et al.  Contrastive Bidirectional Transformer for Temporal Representation Learning , 2019, ArXiv.

[41]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[42]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[43]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[44]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[47]  Chengxu Zhuang,et al.  Local Aggregation for Unsupervised Learning of Visual Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[49]  Julien Mairal,et al.  Unsupervised Pre-Training of Image Features on Non-Curated Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[51]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[52]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[53]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[55]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Pietro Perona,et al.  Object detection and segmentation from joint embedding of parts and pixels , 2011, 2011 International Conference on Computer Vision.

[57]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[58]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[59]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[60]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[63]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[64]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).