Bio-inspired Dynamic 3D Discriminative Skeletal Features for Human Action Recognition

Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in developing methods for human gesture and action recognition from 3D data. A number of approaches have been proposed that extract representative features from 3D depth data, a reconstructed 3D surface mesh or more commonly from the recovered estimate of the human skeleton. Recent advances in neuroscience have discovered a neural encoding of static 3D shapes in primate infero-temporal cortex that can be represented as a hierarchy of medial axis and surface features. We hypothesize a similar neural encoding might also exist for 3D shapes in motion and propose a hierarchy of dynamic medial axis structures at several spatio-temporal scales that can be modeled using a set of Linear Dynamical Systems (LDSs). We then propose novel discriminative metrics for comparing these sets of LDSs for the task of human activity recognition. Combined with simple classification frameworks, our proposed features and corresponding hierarchical dynamical models provide the highest human activity recognition rates as compared to state-of-the-art methods on several skeletal datasets.

[1]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[2]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[3]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[4]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[5]  D. Marr,et al.  Representation and recognition of the movements of shapes , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[6]  Bart De Moor,et al.  N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems , 1994, Autom..

[7]  Mubarak Shah,et al.  Motion-based recognition a survey , 1995, Image Vis. Comput..

[8]  B. Moor,et al.  Subspace angles and distances between ARMA models , 2000 .

[9]  Richard J. Martin A metric for ARMA processes , 2000, IEEE Trans. Signal Process..

[10]  Bart De Moor,et al.  Subspace angles between ARMA models , 2002, Syst. Control. Lett..

[11]  Marcel Körtgen,et al.  3D Shape Matching with 3D Shape Contexts , 2003 .

[12]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[13]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[14]  Nuno Vasconcelos,et al.  Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Mohan M. Trivedi,et al.  3D Shape Context Based Gesture Analysis Integrated with Tracking using Omni Video Array , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[16]  Alexander J. Smola,et al.  Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes , 2007, International Journal of Computer Vision.

[17]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[18]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[19]  Eric T. Carlson,et al.  A neural code for three-dimensional object shape in macaque inferotemporal cortex , 2008, Nature Neuroscience.

[20]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[21]  René Vidal,et al.  Recognition of Visual Dynamical Processes: Theory, Kernels, and Experimental Evaluation , 2009 .

[22]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[23]  Olivier Salvado,et al.  An improved 3D shape context based non-rigid registration method and its application to small animal skeletons registration , 2010, Comput. Medical Imaging Graph..

[24]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[25]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[26]  Eric T. Carlson,et al.  Medial Axis Shape Coding in Macaque Inferotemporal Cortex , 2012, Neuron.

[27]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  René Vidal,et al.  Group action induced distances for averaging and clustering Linear Dynamical Systems with applications to the analysis of dynamic scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[30]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).