论文信息 - Category Query Learning for Human-Object Interaction Classification

Category Query Learning for Human-Object Interaction Classification

Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. Such queries are explicitly associated to interaction categories, converted to image specific category representation via a transformer decoder, and learnt via an auxiliary image-level classification task. This idea is motivated by an earlier multi-label image classification method, but is for the first time applied for the challenging human-object interaction classification task. Our method is simple, general and effective. It is validated on three representative HOI baselines and achieves new state-of-the-art results on two benchmarks.

Yichen Wei | Yue Hu | Fangao Zeng | Chi Xie | Shuang Liang

[1] Samuel Albanie,et al. RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection , 2022, NeurIPS.

[2] Cewu Lu,et al. Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection , 2022, ECCV.

[3] Shaoli Huang,et al. Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection , 2022, ECCV.

[4] Changxing Ding,et al. Distillation Using Oracle Queries for Transformer-based Human-Object Interaction Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ting Yao,et al. Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Errui Ding,et al. Human-Object Interaction Detection via Disentangled Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Chi-Keung Tang,et al. Interactiveness Field in Human-Object Interactions , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Hyunwoo J. Kim,et al. Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Luxin Yan,et al. Category-Aware Transformer Network for Better Human-Object Interaction Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] A S M Iftekhar,et al. What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Jonghwan Mun,et al. MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Xiaobo Li,et al. GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Xiangyu Yue,et al. RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation , 2022, ECCV.

[14] Frederic Z. Zhang,et al. Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] A. Schwing,et al. Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Chen Gao,et al. Mining the Benefits of Two-stage and One-stage HOI Detection , 2021, NeurIPS.

[17] Jun Zhu,et al. Query2Label: A Simple Transformer Way to Multi-Label Classification , 2021, ArXiv.

[18] Eun-Sol Kim,et al. HOTR: End-to-End Human-Object Interaction Detection with Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] D. Tao,et al. Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] C. Qian,et al. Reformulating HOI Detection as Adaptive Set Prediction , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Tomoaki Yoshinaga,et al. QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jian Sun,et al. End-to-End Human Object Interaction Detection with HOI Transformer , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[24] Frederic Z. Zhang,et al. Spatially Conditioned Graphs for Detecting Human–Object Interactions , 2020, IEEE International Conference on Computer Vision.

[25] Dacheng Tao,et al. Polysemy Deciphering Network for Robust Human–Object Interaction Detection , 2020, International Journal of Computer Vision.

[26] Cewu Lu,et al. HOI Analysis: Integrating and Decomposing Human-Object Interaction , 2020, NeurIPS.

[27] Wei-Shi Zheng,et al. Contextual Heterogeneous Graph Network for Human-Object Interaction Detection , 2020, ECCV.

[28] Chen Gao,et al. DRG: Dual Relation Graph for Human-Object Interaction Detection , 2020, ECCV.

[29] Jaewoo Kang,et al. UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection , 2020, ECCV.

[30] Andrew Zisserman,et al. Amplifying Key Cues for Human-Object-Interaction Detection , 2020, ECCV.

[31] In So Kweon,et al. Detecting Human-Object Interactions with Action Co-occurrence Priors , 2020, ECCV.

[32] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[33] Cewu Lu,et al. PaStaNet: Toward Human Activity Knowledge Engine , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Fahad Shahbaz Khan,et al. Learning Human-Object Interaction Detection Using Interaction Points , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] B. S. Manjunath,et al. VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Jiashi Feng,et al. PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Rama Chellappa,et al. Detecting Human-Object Interactions via Functional Generalization , 2019, AAAI.

[38] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Xuming He,et al. Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40] Cewu Lu,et al. Transferable Interactiveness Knowledge for Human-Object Interaction Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Derek Hoiem,et al. No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Song-Chun Zhu,et al. Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.

[43] Chen Gao,et al. iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection , 2018, BMVC.

[44] Kaiming He,et al. Detecting and Recognizing Human-Object Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Jia Deng,et al. Learning to Detect Human-Object Interactions , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[47] Jitendra Malik,et al. Visual Semantic Role Labeling , 2015, ArXiv.

[48] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.