Multi-View Gait Recognition Based on a Spatial-Temporal Deep Neural Network

This paper proposes a novel spatial–temporal deep neural network (STDNN) that is applied to multi-view gait recognition. The STDNN comprises a temporal feature network (TFN) and a spatial feature network (SFN). In TFN, a feature sub-network is adopted to extract the low-level edge features of gait silhouettes. These features are input to the spatial–temporal gradient (STG) network that adopts a STG unit and a long short-term memory unit to extract the STG features. In SFN, the spatial features of gait sequences are extracted by multilayer convolutional neural networks from a gait energy image. The SFN is optimized by classification loss and verification loss jointly, which makes inter-class variations larger than intra-class variations. After training, the TFN and the SFN are employed to extract temporal and spatial features, respectively, which are applied to multi-view gait recognition. Finally, the combined predicted probability is adopted to identify individuals by the differences in their gaits. To evaluate the performance of the STDNN, extensive evaluations are carried out based on the CASIA-B, OU-ISIR, and CMU MoBo data sets. The best recognition scores achieved by STDNN are 95.67% under an identical view, 93.64% under a cross-view, and 92.54% under a multi-view. State-of-the-art approaches are compared with the STDNN in various situations. The results show that the STDNN outperforms the other methods and demonstrates the great potential of the STDNN for practical applications in the future.

[1]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  James Nga-Kwok Liu,et al.  Gait flow image: A silhouette-based gait representation for human identification , 2011, Pattern Recognit..

[3]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Lin Sun,et al.  Lattice Long Short-Term Memory for Human Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Tieniu Tan,et al.  A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  Yasushi Makihara,et al.  The OU-ISIR Gait Database Comprising the Large Population Dataset and Performance Evaluation of Gait Recognition , 2012, IEEE Transactions on Information Forensics and Security.

[8]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[10]  Wei Zeng,et al.  Learning Long-Term Dependencies for Action Recognition with a Biologically-Inspired Deep Network , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Mark S. Nixon,et al.  Model-Based Feature Extraction for Gait Analysis and Recognition , 2007, MIRAGE.

[12]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Xiaogang Wang,et al.  A Comprehensive Study on Cross-View Gait Based Human Identification with Deep CNNs , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Chen Wang,et al.  Human Identification Using Temporal Information Preserving Gait Template , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yong Du,et al.  Facial Expression Recognition Based on Deep Evolutional Spatial-Temporal Networks , 2017, IEEE Transactions on Image Processing.

[16]  Kuo-Chin Fan,et al.  Frame difference history image for gait recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[17]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[18]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[19]  Wei Xiong,et al.  Active energy image plus 2DLPP for gait recognition , 2010, Signal Process..

[20]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[21]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[22]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Shamik Sural,et al.  Gait recognition using Pose Kinematics and Pose Energy Image , 2012, Signal Process..

[24]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qing Wang,et al.  A Novel Human Gait Recognition Method by Segmenting and Extracting the Region Variance Feature , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[26]  Zhelong Wang,et al.  Using Body-Worn Sensors for Preliminary Rehabilitation Assessment in Stroke Victims With Gait Impairment , 2018, IEEE Access.

[27]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[29]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Hua Li,et al.  3D gait recognition using multiple cameras , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[31]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[32]  Haihong Hu,et al.  Frame difference energy image for gait recognition with incomplete silhouettes , 2009, Pattern Recognit. Lett..

[33]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  M. Nixon,et al.  Model-based Gait Recognition , 2009 .

[35]  David Zhang,et al.  Human gait recognition by the fusion of motion and static spatio-temporal templates , 2007, Pattern Recognit..

[36]  Bir Bhanu,et al.  Individual recognition using gait energy image , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[38]  Linqin Cai,et al.  Human Action Recognition Using Improved Sparse Gaussian Process Latent Variable Model and Hidden Conditional Random Filed , 2018, IEEE Access.

[39]  Jim Tørresen,et al.  A robust gait recognition system using spatiotemporal features and deep learning , 2017, 2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI).

[40]  Armand Joulin,et al.  Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[41]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Tieniu Tan,et al.  Fusion of static and dynamic body biometrics for gait recognition , 2004, IEEE Trans. Circuits Syst. Video Technol..

[44]  Thomas Wolf,et al.  Multi-view gait recognition using 3D convolutional neural networks , 2016, 2016 IEEE International Conference on Image Processing (ICIP).