Activity and Relationship Modeling Driven Weakly Supervised Object Detection

This paper presents a weakly supervised object detection method based on activity label and relationship modeling, which is motivated by the assumption that configuration of human and object are similar in same activity, and joint modeling of human, active object and activity could leverage the recognition of them. Compared to most weakly supervised method taking object as independent instance, firstly, active human and object proposals are learned and filtered based on class activation map of multi-label classification. Secondly, a spatial relationship prior including relative position, scale, overlaps etc are learned dependent on action. Finally, a multi-stream object detection framework integrating the spatial prior and pairwise ROI pooling are proposed to jointly learn the object and action class. Experiments are conducted on HICO-DET dataset, and our approach outperforms the state of the art weakly supervised object detection methods.

[1]  Toshihiko Yamasaki,et al.  Object-aware Instance Labeling for Weakly Supervised Object Detection , 2020 .

[2]  Huimin Ma,et al.  WSODPB: Weakly supervised object detection with PCSNet and box regression module , 2020, Neurocomputing.

[3]  Kiyoharu Aizawa,et al.  Object-Aware Instance Labeling for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Chang Liu,et al.  C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hongyang Chao,et al.  WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Cewu Lu,et al.  Pairwise Body-Part Attention for Recognizing Human-Object Interactions , 2018, ECCV.

[7]  Ivan Laptev,et al.  Weakly-Supervised Learning of Visual Relations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[9]  Mohan S. Kankanhalli,et al.  Learning to Detect Human-Object Interactions With Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Cewu Lu,et al.  Transferable Interactiveness Knowledge for Human-Object Interaction Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[14]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Abhinav Gupta,et al.  Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Cordelia Schmid,et al.  Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Shiguang Shan,et al.  Weakly Supervised Object Detection With Segmentation Collaboration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Qi Tian,et al.  Zigzag Learning for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  M. Davare,et al.  Interactions between dorsal and ventral streams for controlling skilled grasp , 2015, Neuropsychologia.

[20]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jitendra Malik,et al.  Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Rama Chellappa,et al.  Detecting Human-Object Interactions via Functional Generalization , 2019, AAAI.

[23]  Wenyu Liu,et al.  PCL: Proposal Cluster Learning for Weakly Supervised Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ramakant Nevatia,et al.  Activity Driven Weakly Supervised Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Song Wang,et al.  Weakly supervised easy-to-hard learning for object detection in image sequences , 2020, Neurocomputing.

[26]  Jia Deng,et al.  Learning to Detect Human-Object Interactions , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kaiming He,et al.  Detecting and Recognizing Human-Object Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[30]  Xuming He,et al.  Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Nicu Sebe,et al.  Self Paced Deep Learning for Weakly Supervised Object Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.