Exploring Deep Models for Practical Gait Recognition

Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively small and shallow neural networks to extract subtle gait features, achieving impressive successes in indoor settings. Nevertheless, experiments revealed that these existing methods mostly produce unsatisfactory results when applied to newly released in-the-wild gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Consequently, we emphasize the importance of suitable network capacity, explicit temporal modeling, and deep transformer structure for discriminative gait representation learning. Our proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance gains in outdoor scenarios, \textit{e.g.}, about +30\% rank-1 accuracy compared with many state-of-the-art methods on the challenging GREW dataset. This work is expected to further boost the research and application of gait recognition. Code will be available at https://github.com/ShiqiYu/OpenGait.

[1]  Wei Su,et al.  MetaGait: Learning to Learn an Omni Sample Adaptive Representation for Gait Recognition , 2023, ECCV.

[2]  Yongzhen Huang,et al.  OpenGait: Revisiting Gait Recognition Toward Better Practicality , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jashila Nair Mogan,et al.  Gait-ViT: Gait Recognition with Vision Transformer , 2022, Sensors.

[4]  K. Lenac,et al.  Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers , 2022, Sensors.

[5]  Yufeng Cui,et al.  GaitTransformer: Multiple-Temporal-Scale Transformer for Cross-View Gait Recognition , 2022, 2022 IEEE International Conference on Multimedia and Expo (ICME).

[6]  S. Shan,et al.  Clothes-Changing Person Re-identification with RGB Modality Only , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xinchen Liu,et al.  Gait Recognition in the Wild with Dense 3D Representations and A Benchmark , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jie Zhou,et al.  Gait Recognition in the Wild: A Benchmark , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Xinggang Wang,et al.  Context-Sensitive Temporal Feature Learning for Gait Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Yongzhen Huang,et al.  Set Residual Network for Silhouette-Based Gait Recognition , 2021, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[11]  Stephen Lin,et al.  Video Swin Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiansheng Hua,et al.  Cloth-Changing Person Re-identification from A Single Image with Gait Prediction and Regularization , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xin Yu,et al.  Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[15]  Qing Li,et al.  GaitPart: Temporal Part-Based Model for Gait Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[17]  Shiqi Yu,et al.  A model-based gait recognition method with body pose and human prior knowledge , 2020, Pattern Recognit..

[18]  Wei Jiang,et al.  Bag of Tricks and a Strong Baseline for Deep Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Yasushi Makihara,et al.  Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition , 2018, IPSJ Transactions on Computer Vision and Applications.

[20]  Jianfeng Feng,et al.  GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition , 2018, AAAI.

[21]  Yunchao Wei,et al.  Horizontal Pyramid Matching for Person Re-identification , 2018, AAAI.

[22]  Qi Tian,et al.  Beyond Part Models: Person Retrieval with Refined Part Pooling , 2017, ECCV.

[23]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[24]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Yutaka Satoh,et al.  Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[29]  Xiaogang Wang,et al.  A Comprehensive Study on Cross-View Gait Based Human Identification with Deep CNNs , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yaser Sheikh,et al.  Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Wolf,et al.  Multi-view gait recognition using 3D convolutional neural networks , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[33]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[34]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Mark S. Nixon,et al.  Automatic Recognition by Gait , 2006, Proceedings of the IEEE.

[40]  Tieniu Tan,et al.  A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[41]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Yasushi Makihara,et al.  End-to-End Model-Based Gait Recognition , 2020, ACCV.

[43]  Yasushi Makihara,et al.  Gait Recognition from a Single Image Using a Phase-Aware Gait Cycle Reconstruction Network , 2020, ECCV.

[44]  Yongzhen Huang,et al.  Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition , 2020, ECCV.