RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection

Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects. Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions. In this paper, we therefore propose novel relation reasoning for HOI detection. We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference. Upon the frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct an end-toend trainable framework named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net sets a new state-of-the-art on both V-COCO and HICO-DET benchmarks and improves the baseline about 5.5% and 9.8% relatively, validating that this first effort in exploring relation reasoning and integrating interactive semantics has brought obvious improvement for end-to-end HOI detection.

[1]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Song-Chun Zhu,et al.  Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.

[3]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[4]  Jian Zhang,et al.  GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency , 2020, Neurocomputing.

[5]  Jorma Laaksonen,et al.  Deep Contextual Attention for Human-Object Interaction Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Cewu Lu,et al.  Transferable Interactiveness Knowledge for Human-Object Interaction Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Fahad Shahbaz Khan,et al.  Learning Human-Object Interaction Detection Using Interaction Points , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ali Farhadi,et al.  Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Fei Wang,et al.  PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Chen Gao,et al.  iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection , 2018, BMVC.

[12]  Mohan S. Kankanhalli,et al.  Interact as You Intend: Intention-Driven Human-Object Interaction Detection , 2018, IEEE Transactions on Multimedia.

[13]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xuming He,et al.  Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Derek Hoiem,et al.  No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Kaiming He,et al.  Detecting and Recognizing Human-Object Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Yuexian Zou,et al.  A Graph-based Interactive Reasoning for Human-Object Interaction Detection , 2020, IJCAI.

[19]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[20]  B. S. Manjunath,et al.  VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ge Li,et al.  C-RPNs: Promoting Object Detection in real world via a Cascade Structure of Region Proposal Networks , 2019, Neurocomputing.

[22]  Cewu Lu,et al.  Pairwise Body-Part Attention for Recognizing Human-Object Interactions , 2018, ECCV.

[23]  Mingmin Chi,et al.  Relation Parsing Neural Network for Human-Object Interaction Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).