Recovering the Unbiased Scene Graphs from the Biased Ones

Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects. Recently, more efforts have been paid to the long tail problem in SGG; however, the imbalance in the fraction of missing labels of different classes, or reporting bias, exacerbating the long tail is rarely considered and cannot be solved by the existing debiasing methods. In this paper we show that, due to the missing labels, SGG can be viewed as a "Learning from Positive and Unlabeled data" (PU learning) problem, where the reporting bias can be removed by recovering the unbiased probabilities from the biased ones by utilizing label frequencies, i.e., the per-class fraction of labeled, positive examples in all the positive examples. To obtain accurate label frequency estimates, we propose Dynamic Label Frequency Estimation (DLFE) to take advantage of training-time data augmentation and average over multiple training iterations to introduce more valid examples. Extensive experiments show that DLFE is more effective in estimating label frequencies than a naive variant of the traditional estimate, and DLFE significantly alleviates the long tail and achieves state-of-the-art debiasing performance on the VG dataset. We also show qualitatively that SGG models with DLFE produce prominently more balanced and unbiased scene graphs. The source code is publicly available.

[1]  Gangshan Wu,et al.  Visual Relation of Interest Detection , 2020, ACM Multimedia.

[2]  Aaron C. Courville,et al.  Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation , 2020, BMVC.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Jorma Laaksonen,et al.  Tackling the Unannotated: Scene Graph Generation with Bias-Reduced Models , 2020, BMVC.

[5]  Michael S. Bernstein,et al.  Scene Graph Prediction with Limited Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[6]  Svetlana Lazebnik,et al.  Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[8]  Michael S. Bernstein,et al.  Visual Relationship Detection with Language Priors , 2016, ECCV.

[9]  A. Rosenfeld,et al.  Edge and Curve Detection for Visual Scene Analysis , 1971, IEEE Transactions on Computers.

[10]  Xian-Sheng Hua,et al.  PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation , 2020, ACM Multimedia.

[11]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[12]  Liang Lin,et al.  Knowledge-Embedded Routing Network for Scene Graph Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Evgeny Burnaev,et al.  Influence of resampling on accuracy of imbalanced classification , 2015, International Conference on Machine Vision.

[14]  Volker Tresp,et al.  Classification by Attention: Scene Graph Classification with Prior Knowledge , 2020, AAAI.

[15]  Stefan Lee,et al.  Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[16]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jinquan Zeng,et al.  GPS-Net: Graph Property Sensing Network for Scene Graph Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Shuicheng Yan,et al.  Scene Graph Generation With Hierarchical Context , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[20]  Yongdong Zhang,et al.  Part-Aware Interactive Learning for Scene Graph Generation , 2020, ACM Multimedia.

[21]  Jesse Davis,et al.  Learning from positive and unlabeled data: a survey , 2018, Machine Learning.

[22]  Xilin Chen,et al.  Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation , 2020, ECCV.

[23]  Jiashi Feng,et al.  Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations , 2021, IEEE Access.

[24]  Roger Zimmermann,et al.  ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detection in Videos , 2021, ICDAR@ICMR.

[25]  Oliver Schulte,et al.  Deep Generative Probabilistic Graph Neural Networks for Scene Graph Generation , 2020, AAAI.

[26]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jingkuan Song,et al.  Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation , 2020, IJCAI.

[28]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Tao Mei,et al.  Exploring Visual Relationship for Image Captioning , 2018, ECCV.

[31]  Jianqiang Huang,et al.  Unbiased Scene Graph Generation From Biased Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Silvio Savarese,et al.  3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[34]  Anton van den Hengel,et al.  Graph-Structured Representations for Visual Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ross B. Girshick,et al.  Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jianfei Cai,et al.  Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jonathan Berant,et al.  Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction , 2018, NeurIPS.

[39]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[40]  Chun Yuan,et al.  HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation , 2020, ACM Multimedia.

[41]  Shuqiang Jiang,et al.  Know More Say Less: Image Captioning Based on Scene Graphs , 2019, IEEE Transactions on Multimedia.

[42]  B. Rosenhahn,et al.  NODIS: Neural Ordinary Differential Scene Understanding , 2020, European Conference on Computer Vision.

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Wen Gao,et al.  Soft Transfer Learning via Gradient Diagnosis for Visual Relationship Detection , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Wei Liu,et al.  Learning to Compose Dynamic Tree Structures for Visual Contexts , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Michael S. Bernstein,et al.  Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[47]  Roger Zimmermann,et al.  Zero-Shot Multi-View Indoor Localization via Graph Location Networks , 2020, ACM Multimedia.

[48]  Shih-Fu Chang,et al.  Bridging Knowledge Graphs to Generate Scene Graphs , 2020, ECCV.

[49]  Juan-Zi Li,et al.  Explainable and Explicit Visual Reasoning Over Scene Graphs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Jianfei Cai,et al.  Scene Graph Generation With External Knowledge and Image Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yejin Choi,et al.  Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Shih-Fu Chang,et al.  Learning Visual Commonsense for Robust Scene Graph Generation: Supplementary Material , 2020 .