Spatio-temporal silhouette sequence reconstruction for gait recognition against occlusion

Gait-based features provide the potential for a subject to be recognized even from a low-resolution image sequence, and they can be captured at a distance without the subject’s cooperation. Person recognition using gait-based features (gait recognition) is a promising real-life application. However, several body parts of the subjects are often occluded because of beams, pillars, cars and trees, or another walking person. Therefore, gait-based features are not applicable to approaches that require an unoccluded gait image sequence. Occlusion handling is a challenging but important issue for gait recognition. In this paper, we propose silhouette sequence reconstruction from an occluded sequence (sVideo) based on a conditional deep generative adversarial network (GAN). From the reconstructed sequence, we estimate the gait cycle and extract the gait features from a one gait cycle image sequence. To regularize the training of the proposed generative network, we use adversarial loss based on triplet hinge loss incorporating Wasserstein GAN (WGAN-hinge). To the best of our knowledge, WGAN-hinge is the first adversarial loss that supervises the generator network during training by incorporating pairwise similarity ranking information. The proposed approach was evaluated on multiple challenging occlusion patterns. The experimental results demonstrate that the proposed approach outperforms the existing state-of-the-art benchmarks.

[1]  Daniel Wolf,et al.  Identification and Reconstruction of Complete Gait Cycles for Person Identification in Crowded Scenes , 2011, VISAPP.

[2]  Raúl Martín-Félez,et al.  Gait recognition from corrupted silhouettes: a robust statistical approach , 2017, Machine Vision and Applications.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[5]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[6]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[7]  Sudeep Sarkar,et al.  Effect of silhouette quality on hard problems in gait recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[9]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[11]  Tao Xiang,et al.  Uncooperative gait recognition by learning to rank , 2014, Pattern Recognit..

[12]  Yasushi Makihara,et al.  MultiQ: single sensor-based multi-quality multi-modal large-scale biometric score database and its performance evaluation , 2017, IPSJ Transactions on Computer Vision and Applications.

[13]  Yasushi Makihara,et al.  Gait regeneration for recognition , 2015, 2015 International Conference on Biometrics (ICB).

[14]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[15]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[16]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yasushi Makihara,et al.  Gait-Based Person Recognition Using Arbitrary View Transformation Model , 2015, IEEE Transactions on Image Processing.

[18]  Luc Van Gool,et al.  Improving Video Generation for Multi-functional Applications. , 2017 .

[19]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Yasushi Makihara,et al.  The OU-ISIR Gait Database Comprising the Large Population Dataset and Performance Evaluation of Gait Recognition , 2012, IEEE Transactions on Information Forensics and Security.

[22]  Yasushi Makihara,et al.  Clothing-invariant gait identification using part-based clothing categorization and adaptive weight control , 2010, Pattern Recognit..

[23]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Bernhard Schölkopf,et al.  Flexible Spatio-Temporal Networks for Video Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shiqi Yu,et al.  GaitGAN: Invariant Gait Feature Extraction Using Generative Adversarial Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Alexei A. Efros,et al.  Texture synthesis by non-parametric sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[28]  Shamik Sural,et al.  Occlusion detection and gait silhouette reconstruction from degraded scenes , 2011, Signal Image Video Process..

[29]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Tieniu Tan,et al.  A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[31]  Bir Bhanu,et al.  Statistical feature fusion for gait-based human recognition , 2004, CVPR 2004.

[32]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[33]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[34]  Xiang Li,et al.  The OU-ISIR Large Population Gait Database with real-life carried object and its performance evaluation , 2018, IPSJ Transactions on Computer Vision and Applications.

[35]  Haihong Hu,et al.  Frame difference energy image for gait recognition with incomplete silhouettes , 2009, Pattern Recognit. Lett..

[36]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[37]  Eli Shechtman,et al.  Space-Time Completion of Video , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Chuan Wang,et al.  Video Inpainting by Jointly Learning Temporal Structure and Spatial Details , 2018, AAAI.

[40]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[41]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[42]  Pinit Kumhom,et al.  Gait identification with partial occlusion using six modules and consideration of occluded module exclusion , 2016, J. Vis. Commun. Image Represent..

[43]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Yasushi Makihara,et al.  Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition , 2018, IPSJ Transactions on Computer Vision and Applications.

[45]  Chi-Keung Tang,et al.  Deep Video Generation, Prediction and Completion of Human Action Sequences , 2017, ECCV.

[46]  Hua Li,et al.  Gait recognition using fractal scale , 2007, Pattern Analysis and Applications.