Learning Gait Representation From Massive Unlabelled Walking Videos: A Benchmark

Gait depicts individuals' unique and distinguishing walking patterns and has become one of the most promising biometric features for human identification. As a fine-grained recognition task, gait recognition is easily affected by many factors and usually requires a large amount of completely annotated data that is costly and insatiable. This paper proposes a large-scale self-supervised benchmark for gait recognition with contrastive learning, aiming to learn the general gait representation from massive unlabelled walking videos for practical applications via offering informative walking priors and diverse real-world variations. Specifically, we collect a large-scale unlabelled gait dataset GaitLU-1M consisting of 1.02M walking sequences and propose a conceptually simple yet empirically powerful baseline model GaitSSB. Experimentally, we evaluate the pre-trained model on four widely-used gait benchmarks, CASIA-B, OU-MVLP, GREW and Gait3D with or without transfer learning. The unsupervised results are comparable to or even better than the early model-based and GEI-based methods. After transfer learning, GaitSSB outperforms existing methods by a large margin in most cases, and also showcases the superior generalization capacity. Further experiments indicate that the pre-training can save about 50% and 80% annotation costs of GREW and Gait3D. Theoretically, we discuss the critical issues for gait-specific contrastive framework and present some insights for further study. As far as we know, GaitLU-1M is the first large-scale unlabelled gait dataset, and GaitSSB is the first method that achieves remarkable unsupervised results on the aforementioned benchmarks. The source code of GaitSSB and anonymous data of GaitLU-1M is available at https://github.com/ShiqiYu/OpenGait.

[1]  Yongzhen Huang,et al.  OpenGait: Revisiting Gait Recognition Toward Better Practicality , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Clayton D. Scott,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Annan Li,et al.  Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yuhan Zhao,et al.  GaitMPL: Gait Recognition With Memory-Augmented Progressive Learning , 2022, IEEE Transactions on Image Processing.

[5]  G. Rigoll,et al.  Towards a Deeper Understanding of Skeleton-based Gait Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Xinchen Liu,et al.  Gait Recognition in the Wild with Dense 3D Representations and A Benchmark , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yasushi Makihara,et al.  Multi-View Large Population Gait Database With Human Meshes and Its Performance Evaluation , 2022, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[8]  Yongzhen Huang,et al.  GaitEdge: Beyond Plain End-to-end Gait Recognition for Better Practicality , 2022, ECCV.

[9]  Jiande Sun,et al.  GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework , 2022, ACCV.

[10]  Anil K. Jain,et al.  RealGait: Gait Recognition for Person Re-Identification , 2022, ArXiv.

[11]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ping Luo,et al.  ByteTrack: Multi-Object Tracking by Associating Every Detection Box , 2021, ECCV.

[13]  Jie Zhou,et al.  Gait Recognition in the Wild: A Benchmark , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Xinmei Tian,et al.  3D Local Convolutional Neural Networks for Gait Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Tao Mei,et al.  Motion-Focused Contrastive Learning of Video Representations* , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Xinggang Wang,et al.  Context-Sensitive Temporal Feature Learning for Gait Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Yasushi Makihara,et al.  End-to-end Model-based Gait Recognition using Synchronized Multi-view Pose Constraint , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[18]  Annan Li,et al.  Cross-View Gait Recognition with Deep Universal Linear Embeddings , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Oisin Mac Aodha,et al.  When Does Contrastive Visual Representation Learning Work? , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kaiming He,et al.  A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hongming Shan,et al.  Selfgait: A Spatiotemporal Representation Learning Method for Self-Supervised Gait Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Hanqing Chao,et al.  GaitSet: Cross-View Gait Recognition Through Utilizing Gait As a Deep Set , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  G. Rigoll,et al.  Gaitgraph: Graph Convolutional Network for Skeleton-Based Gait Recognition , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[24]  Mubarak Shah,et al.  TCLR: Temporal contrastive learning for video representation , 2021, Comput. Vis. Image Underst..

[25]  Yuying Hao,et al.  PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation , 2021, ArXiv.

[26]  Dong Chen,et al.  Unsupervised Pre-training for Person Re-identification , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xin Yu,et al.  Gait Recognition via Effective Global-Local Feature Representation and Local Temporal Aggregation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[30]  Xinwang Liu,et al.  A Self-Supervised Gait Encoding Approach With Locality-Awareness for 3D Skeleton Based Person Re-Identification , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Serge J. Belongie,et al.  Spatiotemporal Contrastive Video Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[33]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[34]  Hongsheng Li,et al.  Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID , 2020, NeurIPS.

[35]  Qing Li,et al.  GaitPart: Temporal Part-Based Model for Gait Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yasushi Yagi,et al.  Gait Recognition via Semi-supervised Disentangled Representation Learning to Identity and Covariate Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[38]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[39]  Shiqi Yu,et al.  A model-based gait recognition method with body pose and human prior knowledge , 2020, Pattern Recognit..

[40]  Liang Wang,et al.  GaitNet: An end-to-end network for gait based human identification , 2019, Pattern Recognit..

[41]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xilin Chen,et al.  Object-Contextual Representations for Semantic Segmentation , 2019, ECCV.

[43]  Feng Liu,et al.  On Learning Disentangled Representations for Gait Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[45]  Hongdong Li,et al.  Learning Joint Gait Representation via Quintuplet Loss Minimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yasushi Makihara,et al.  Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition , 2018, IPSJ Transactions on Computer Vision and Applications.

[47]  Ning Xu,et al.  YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark , 2018, ArXiv.

[48]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[49]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Shiqi Yu,et al.  Pose-Based Temporal-Spatial Network (PTSN) for Gait Recognition with Carrying and Clothing Variations , 2017, CCBR.

[51]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[52]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[53]  Xiaogang Wang,et al.  A Comprehensive Study on Cross-View Gait Based Human Identification with Deep CNNs , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[55]  Yasushi Makihara,et al.  GEINet: View-invariant gait recognition using a convolutional neural network , 2016, 2016 International Conference on Biometrics (ICB).

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Ira Kemelmacher-Shlizerman,et al.  The MegaFace Benchmark: 1 Million Faces for Recognition at Scale , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2014, 1406.2661.

[59]  Qiang Wu,et al.  A New View-Invariant Feature for Cross-View Gait Recognition , 2013, IEEE Transactions on Information Forensics and Security.

[60]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[61]  Chen Wang,et al.  Human Identification Using Temporal Information Preserving Gait Template , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Gunawan Ariyanto,et al.  Model-based 3D gait biometrics , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[63]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[64]  Qiang Wu,et al.  Multiple views gait recognition using View Transformation Model based on optimized Gait Energy Image , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[65]  Osama Masoud,et al.  View-independent human motion classification using image-based reconstruction , 2009, Image Vis. Comput..

[66]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[68]  Tieniu Tan,et al.  A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[69]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[70]  Yasushi Makihara,et al.  Gait Recognition Using a View Transformation Model in the Frequency Domain , 2006, ECCV.

[71]  Bir Bhanu,et al.  Individual recognition using gait energy image , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Tieniu Tan,et al.  Gait Recognition Based on Fusion of Multi-view Gait Sequences , 2006, ICB.

[73]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74]  Mark S. Nixon,et al.  Recognising humans by gait via parametric canonical space , 1999, Artif. Intell. Eng..

[75]  Hiroshi Murase,et al.  Moving object recognition in eigenspace representation: gait analysis and lip reading , 1996, Pattern Recognit. Lett..

[76]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[77]  Genko Uchida,et al.  Technology in China , 1966 .

[78]  Yasushi Makihara,et al.  Occlusion-Aware Human Mesh Model-Based Gait Recognition , 2023, IEEE Transactions on Information Forensics and Security.

[79]  Yasushi Makihara,et al.  End-to-End Model-Based Gait Recognition , 2020, ACCV.

[80]  Yasushi Makihara,et al.  Gait Recognition from a Single Image Using a Phase-Aware Gait Cycle Reconstruction Network , 2020, ECCV.

[81]  Yongzhen Huang,et al.  Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition , 2020, ECCV.

[82]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[83]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[84]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[85]  THE TO THE EUROPEAN PARLIAMENT AND THE COUNCIL , 2014 .