论文信息 - POINet: Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking

POINet: Pose-Guided Ovonic Insight Network for Multi-Person Pose Tracking

Multi-person pose tracking aims to jointly estimate and track multi-person keypoints in the unconstrained videos. The most popular solution to this task follows the tracking-by-detection strategy that relies on human detection and data association. While human detection has been boosted by deep learning, existing works mainly exploit several separated stages with hand-crafted metrics to realize data association, leading to great uncertainty and feeble adaption in complex scenes. To handle these problems, we propose an end-to-end pose-guided ovonic insight network (POINet) for the data association in multi-person pose tracking, which jointly learns feature extraction, similarity estimation, and identity assignment. Specifically, we design a pose-guided representation network to integrate pose information into hierarchical convolutional features, generating a pose-aligned person representation for person, which helps handle partial occlusions. Moreover, we propose an ovonic insight network to adaptively encode the cross-frame identity transformation, which can cope with the tough tracking cases of person leaving and entering the scene. In general, the proposed POINet provides a new insight to realize multi-person pose tracking in an end-to-end fashion. Extensive experiments conducted on the PoseTrack benchmark demonstrate that our POINet outperforms the state-of-the-art methods.

[1] Jenq-Neng Hwang,et al. Exploit the Connectivity: Multi-Object Tracking with TrackletNet , 2018, ACM Multimedia.

[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[3] Silvio Savarese,et al. Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Wu Liu,et al. T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition , 2018, AAAI.

[7] Wenhan Luo,et al. Multiple object tracking: A literature review , 2014, Artif. Intell..

[8] Yang Gao,et al. Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Wei Wu,et al. Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification , 2019, ArXiv.

[10] Cordelia Schmid,et al. DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[11] Hua Yang,et al. Online Multi-Object Tracking with Dual Matching Attention Networks , 2018, ECCV.

[12] Victor S. Lempitsky,et al. Multi-Region bilinear convolutional neural networks for person re-identification , 2015, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[13] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[14] Jonathan Tompson,et al. Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Andrea Palazzi,et al. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World , 2018, ECCV.

[16] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[17] Silvio Savarese,et al. Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18] Bernt Schiele,et al. ArtTrack: Articulated Multi-Person Tracking in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Bernt Schiele,et al. DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[20] Juergen Gall,et al. PoseTrack: Joint Multi-person Pose Estimation and Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Lorenzo Torresani,et al. Detect-and-Track: Efficient Pose Estimation in Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Gang Yu,et al. Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Pong C. Yuen,et al. Dynamic Graph Co-Matching for Unsupervised Video-Based Person Re-Identification , 2019, IEEE Transactions on Image Processing.

[24] Yi Yang,et al. DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Shin'ichi Satoh,et al. Person Reidentification via Discrepancy Matrix and Matrix Metric , 2018, IEEE Transactions on Cybernetics.

[26] Bernt Schiele,et al. PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[28] Ioannis A. Kakadiaris,et al. To Track or To Detect? An Ensemble Framework for Optimal Selection , 2012, ECCV.

[29] Yichen Wei,et al. Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[30] Anup Basu,et al. Adaptive Resolution Optimization and Tracklet Reliability Assessment for Efficient Multi-Object Tracking , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[31] Tao Mei,et al. Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Haoyu Wang,et al. Pose Flow: Efficient Online Pose Tracking , 2018, BMVC.

[33] Nenghai Yu,et al. Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Wu Liu,et al. Learning Efficient Spatial-Temporal Gait Features with Deep Learning for Human Identification , 2018, Neuroinformatics.

[35] Wu Liu,et al. A Progressive Search Paradigm for the Internet of Things , 2018, IEEE MultiMedia.

[36] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[37] Ruimin Hu,et al. Multi-Correlation Filters With Triangle-Structure Constraints for Object Tracking , 2019, IEEE Transactions on Multimedia.

[38] Wei An,et al. Semi-Online Multiple Object Tracking Using Graphical Tracklet Association , 2018, IEEE Signal Processing Letters.

[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40] Ruimin Hu,et al. Boosted local classifiers for visual tracking , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[41] Mubarak Shah,et al. Deep Affinity Network for Multiple Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Ruimin Hu,et al. Object tracking via online trajectory optimization with multi-feature fusion , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[43] Rainer Stiefelhagen,et al. Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[44] Huaping Liu,et al. Toward Efficient Action Recognition: Principal Backpropagation for Training Two-Stream Networks , 2019, IEEE Transactions on Image Processing.

[45] Wei Wu,et al. End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Bernt Schiele,et al. Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).