Evaluation of Cnn Architectures for Gait Recognition Based on Optical Flow Maps

This work targets people identification in video based on the way they walk (i.e. gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the training ones makes the search of a good CNN architecture a challenging task. We carry out a thorough experimental evaluation deploying and analyzing four distinct CNN models with different depth but similar complexity. We show that even the simplest CNN models greatly improve the results using shallow classifiers. All our experiments have been carried out on the challenging TUM-GAID dataset, which contains people in different covariate scenarios (i.e. clothing, shoes, bags).

[1]  Emdad Hossain,et al.  Multimodal Feature Learning for Gait Biometric Based Human Identity Recognition , 2013, ICONIP.

[2]  Csaba Benedek,et al.  Feature selection for Lidar-based gait recognition , 2015, 2015 International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM).

[3]  Ausif Mahmood,et al.  Improved Gait recognition based on specialized deep convolutional neural networks , 2015, 2015 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[4]  Neil M. Robertson,et al.  Dynamic Distance-Based Shape Features for Gait Recognition , 2014, Journal of Mathematical Imaging and Vision.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[7]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[8]  Chang-Tsun Li,et al.  A robust speed-invariant gait recognition system for walker and runner identification , 2013, 2013 International Conference on Biometrics (ICB).

[9]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[10]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[11]  Yi Zhu,et al.  Hidden Two-Stream Convolutional Networks for Action Recognition , 2017, ACCV.

[12]  Bir Bhanu,et al.  Individual recognition using gait energy image , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Björn W. Schuller,et al.  The TUM Gait from Audio, Image and Depth (GAID) database: Multimodal recognition of subjects and traits , 2014, J. Vis. Commun. Image Represent..

[15]  Rafael Muñoz-Salinas,et al.  Fisher Motion Descriptor for Multiview Gait Recognition , 2016, Int. J. Pattern Recognit. Artif. Intell..

[16]  Liang Wang,et al.  Learning Representative Deep Features for Image Set Analysis , 2015, IEEE Transactions on Multimedia.

[17]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Xiaogang Wang,et al.  A Comprehensive Study on Cross-View Gait Based Human Identification with Deep CNNs , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Marc Van Droogenbroeck,et al.  Frontal-view gait recognition by intra- and inter-frame rectangle size distribution , 2009, Pattern Recognit. Lett..

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[24]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).