Empirical Study of Audio-Visual Features Fusion for Gait Recognition

The goal of this paper is to evaluate how the fusion of audio and visual features can help in the challenging task of people identification based on their gait (i.e. the way they walk), or gait recognition. Most of previous research on gait recognition has focused on designing visual descriptors, mainly on binary silhouettes, or building sophisticated machine learning frameworks. However, little attention has been paid to audio patterns associated to the action of walking. So, we propose and evaluate here a multimodal system for gait recognition. The proposed approach is evaluated on the challenging ‘TUM GAID’ dataset, which contains audio recordings in addition to image sequences. The experimental results show that using late fusion to combine two kinds of tracklet-based visual features with audio features improves the state-of-the-art results on the standard experiments defined on the dataset.

[1]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[2]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[3]  Neil M. Robertson,et al.  Dynamic Distance-Based Shape Features for Gait Recognition , 2014, Journal of Mathematical Imaging and Vision.

[4]  Rafael Medina Carnicer,et al.  Pyramidal Fisher Motion for Multiview Gait Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[5]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[6]  Chen Wang,et al.  Multiple HOG templates for gait recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[7]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[8]  Chang-Tsun Li,et al.  A robust speed-invariant gait recognition system for walker and runner identification , 2013, 2013 International Conference on Biometrics (ICB).

[9]  Björn Schuller,et al.  Acoustic Gait-based Person Identification using Hidden Markov Models , 2014, MAPTRAITS '14.

[10]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[11]  Cong Wang,et al.  Silhouette-based gait recognition via deterministic learning , 2014, Pattern Recognit..

[12]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[13]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Björn W. Schuller,et al.  The TUM Gait from Audio, Image and Depth (GAID) database: Multimodal recognition of subjects and traits , 2014, J. Vis. Commun. Image Represent..

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[17]  Bir Bhanu,et al.  Individual recognition using gait energy image , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[19]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[20]  Tao Xiang,et al.  Uncooperative gait recognition by learning to rank , 2014, Pattern Recognit..

[21]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Tieniu Tan,et al.  A Framework for Evaluating the Effect of View Angle, Clothing and Carrying Condition on Gait Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[23]  Rafael Muñoz-Salinas,et al.  Human interaction categorization by using audio-visual cues , 2013, Machine Vision and Applications.

[24]  Dong Liu,et al.  Robust late fusion with rank minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[27]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[28]  Björn W. Schuller,et al.  Gait-based person identification by spectral, cepstral and energy-related audio features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.