DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering

Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Current approaches often rely on extracting deep features from pre-trained networks to construct a graph, and classical clustering methods like k-means and normalized-cuts are then applied as a post-processing step. However, this approach reduces the high-dimensional information encoded in the features to pair-wise scalar affinities. To address this limitation, this study introduces a lightweight Graph Neural Network (GNN) to replace classical clustering methods while optimizing for the same clustering objective function. Unlike existing methods, our GNN takes both the pair-wise affinities between local image features and the raw features as input. This direct connection between the raw features and the clustering objective enables us to implicitly perform classification of the clusters between different graphs, resulting in part semantic segmentation without the need for additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training an image segmentation GNN. Furthermore, we employ the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters, allowing for k-less clustering. We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks1.

[1]  A. Vedaldi,et al.  Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  D. Vaufreydaz,et al.  Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  S. Bagon,et al.  Deep ViT Features as Dense Visual Descriptors , 2021, ArXiv.

[4]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  A. Vedaldi,et al.  Unsupervised Part Discovery from Contrastive Reconstruction , 2021, NeurIPS.

[6]  Jean Ponce,et al.  Localizing Objects with Self-Supervised Transformers and no Labels , 2021, BMVC.

[7]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[8]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Armand Joulin,et al.  Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Saining Xie,et al.  An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Weiwei Jiang,et al.  Graph Neural Network for Traffic Forecasting: A Survey , 2021, Expert Syst. Appl..

[12]  Jean Ponce,et al.  Toward unsupervised, multi-object discovery in large-scale image collections , 2020, ECCV.

[13]  Artem Babenko,et al.  Object Segmentation Without Labels with Large-Scale Generative Models , 2020, ICML.

[14]  Yin Li,et al.  Interpretable and Accurate Fine-grained Recognition via Region Grouping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yong Jae Lee,et al.  Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Lior Wolf,et al.  OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering , 2019, ECCV.

[17]  Cesare Alippi,et al.  Spectral Clustering with Graph Neural Networks for Graph Pooling , 2019, ICML.

[18]  Pavlo Molchanov,et al.  SCOPS: Self-Supervised Co-Part Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Sabine Süsstrunk,et al.  Deep Feature Factorization For Concept Discovery , 2018, ECCV.

[20]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[21]  Huchuan Lu,et al.  Learning to Detect Salient Objects with Image-Level Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[23]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[25]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Li Xu,et al.  Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  K. V. D. Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[30]  Shai Bagon,et al.  A Multiscale Framework for Challenging Discrete Optimization , 2012, ArXiv.

[31]  Shai Bagon,et al.  Large Scale Correlation Clustering Optimization , 2011, ArXiv.

[32]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[33]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[35]  Anthony Wirth,et al.  Correlation Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[36]  J. Andrade-Cetto,et al.  Object Recognition , 2014, Computer Vision, A Reference Guide.

[37]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[38]  Christopher K. I. Williams,et al.  The Pascal Visual Object Classes Challenge 2006 ( VOC 2006 ) Results , 2006 .

[39]  and as an in , 2022 .