CuDi3D: Curvilinear displacement based approach for online 3D action detection

Being able to interactively detect and recognize 3D actions based on skeleton data, in unsegmented streams, has become an important computer vision topic. It raises three scientific problems in relation with variability. The first one is the temporal variability that occurs when subjects perform gestures with different speeds. The second one is the inter-class spatial variability, which refers to disparities between the displacement amounts induced by different classes (i.e. long vs. short movements). The last one is the intra-class spatial variability caused by differences in style and gesture amplitude. In this paper, we design an original approach that better considers these three issues. To address temporal variability we introduce the notion of curvilinear segmentation. It consists in extracting features, not on temporally-based sliding windows, but on trajectory segments for which the cumulated displacement equals a class-based amount. Second, to tackle inter-class spatial variability, we define several competing classifiers with their dedicated curvilinear windows. Last, we address intra-class spatial variability by designing a fusion system that takes the decisions and confidence scores of every competing classifier into account. Extensive experiments on four challenging skeleton-based datasets demonstrate the relevance of the proposed approach for action recognition and online action detection.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[3]  Alberto Del Bimbo,et al.  Combined shape analysis of human poses and motion units for action segmentation and recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[4]  Alberto Del Bimbo,et al.  Motion segment decomposition of RGB-D sequences for human behavior understanding , 2017, Pattern Recognit..

[5]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[6]  Yi Wang,et al.  Sequential Max-Margin Event Detectors , 2014, ECCV.

[7]  Xin Zhao,et al.  Structured Streaming Skeleton -- A New Feature for Online Human Gesture Recognition , 2014, TOMM.

[8]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[9]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[10]  Dimitrios Makris,et al.  Linear latent low dimensional space for online early action recognition and prediction , 2017, Pattern Recognit..

[11]  Quan Z. Sheng,et al.  Online human gesture recognition from motion data streams , 2013, ACM Multimedia.

[12]  Dimitrios Makris,et al.  Dynamic Feature Selection for Online Action Recognition , 2013, HBU.

[13]  Thales Vieira,et al.  Online human moves recognition through discriminative key poses and speed-aware action graphs , 2016, Machine Vision and Applications.

[14]  Sophie Jörg,et al.  Player perception of delays and jitter in character responsiveness , 2014, SAP.

[15]  Amr Sharaf,et al.  Real-Time Multi-scale Action Detection from 3D Skeleton Data , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[16]  Franck Multon,et al.  Dealing with variability when Recognizing User's Performance in Natural 3D Gesture Interfaces , 2013, Int. J. Pattern Recognit. Artif. Intell..

[17]  Dimitrios Makris,et al.  G3D: A gaming action dataset and real time action recognition evaluation framework , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Fei Han,et al.  Space-Time Representation of People Based on 3D Skeletal Data: A Review , 2016, Comput. Vis. Image Underst..

[19]  Moustafa Meshry,et al.  Linear-time online action detection from 3D skeletal data using bags of gesturelets , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Cees Snoek,et al.  Online Action Detection , 2016, ECCV.

[21]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ehud Rivlin,et al.  Online action recognition using covariance of shape and motion , 2014, Comput. Vis. Image Underst..

[23]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Franck Multon,et al.  HIF3D: Handwriting-Inspired Features for 3D skeleton-based action recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[25]  Franck Multon,et al.  3D Multistroke Mapping (3DMM): Transfer of Hand-Drawn Pattern Representation for Skeleton-Based Gesture Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[26]  Sophie Jörg,et al.  How responsiveness affects players' perception in digital games , 2012, SAP.

[27]  Sebastian Nowozin,et al.  Action Points: A Representation for Low-latency Online Human Action Recognition , 2012 .

[28]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[29]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[30]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[31]  Xiang Zhang,et al.  Performance-driven motion choreographing with accelerometers , 2009 .

[32]  Hugo Jair Escalante,et al.  A naïve Bayes baseline for early gesture recognition , 2016, Pattern Recognit. Lett..

[33]  Masaki Oshita,et al.  Motion-capture-based avatar control framework in third-person view virtual environments , 2006, ACE '06.

[34]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[36]  Bruno Arnaldi,et al.  Morphology‐independent representation of motions for interactive human‐like animation , 2005, Comput. Graph. Forum.

[37]  Xi Chen,et al.  Classifying and visualizing motion capture sequences using deep neural networks , 2013, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[38]  Dimitrios Makris,et al.  Hierarchical transfer learning for online recognition of compound actions , 2016, Comput. Vis. Image Underst..

[39]  Tae-Kyun Kim,et al.  Real-Time Online Action Detection Forests Using Spatio-Temporal Contexts , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Wenjun Zeng,et al.  Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks , 2016, ECCV.

[41]  Michelle Karg,et al.  Movement Primitive Segmentation for Human Motion Modeling: A Framework for Analysis , 2016, IEEE Transactions on Human-Machine Systems.

[42]  Dimitrios Makris,et al.  Clustered Spatio-temporal Manifolds for Online Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.