Hierarchical transfer learning for online recognition of compound actions

A novel online action recognition method for fast detection of compound actions.A key contribution is a transfer learning strategy from simple to complex datasets.Another key contribution is an automatically configured hierarchical body model.Experimental results show an improvement in action recognition performance of 16%.The proposed algorithm is real-time with an average latency of just 2 frames. Recognising human actions in real-time can provide users with a natural user interface (NUI) enabling a range of innovative and immersive applications. A NUI application should not restrict users' movements; it should allow users to transition between actions in quick succession, which we term as compound actions. However, the majority of action recognition researchers have focused on individual actions, so their approaches are limited to recognising single actions or multiple actions that are temporally separated.This paper proposes a novel online action recognition method for fast detection of compound actions. A key contribution is our hierarchical body model that can be automatically configured to detect actions based on the low level body parts that are the most discriminative for a particular action. Another key contribution is a transfer learning strategy to allow the tasks of action segmentation and whole body modelling to be performed on a related but simpler dataset, combined with automatic hierarchical body model adaption on a more complex target dataset.Experimental results on a challenging and realistic dataset show an improvement in action recognition performance of 16% due to the introduction of our hierarchical transfer learning. The proposed algorithm is fast with an average latency of just 2 frames (66?ms) and outperforms state of the art action recognition algorithms that are capable of fast online action recognition.

[1]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[2]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[3]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[4]  Alexandros André Chaaraoui,et al.  Continuous Human Action Recognition in Ambient Assisted Living Scenarios , 2014, MONAMI.

[5]  Nicu Sebe,et al.  Harnessing Lab Knowledge for Real-World Action Recognition , 2014, International Journal of Computer Vision.

[6]  Dimitrios Makris,et al.  Clustered Spatio-temporal Manifolds for Online Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[7]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yang Wang,et al.  Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[9]  John Darby,et al.  Backing Off: Hierarchical Decomposition of Activity for 3D Novel Pose Recovery , 2009, BMVC.

[10]  Jean-Christophe Nebel,et al.  Temporal Extension of Laplacian Eigenmaps for Unsupervised Dimensionality Reduction of Time Series , 2010, 2010 20th International Conference on Pattern Recognition.

[11]  Rogério Schmidt Feris,et al.  Benchmarking Datasets for Human Activity Recognition , 2011, Visual Analysis of Humans.

[12]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[13]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.

[14]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[15]  Quan Z. Sheng,et al.  Online human gesture recognition from motion data streams , 2013, ACM Multimedia.

[16]  Pavel Senin,et al.  Dynamic Time Warping Algorithm Review , 2008 .

[17]  Dimitrios Makris,et al.  Dynamic Feature Selection for Online Action Recognition , 2013, HBU.

[18]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[19]  Sebastian Nowozin,et al.  Action Points: A Representation for Low-latency Online Human Action Recognition , 2012 .

[20]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Ehud Rivlin,et al.  Online action recognition using covariance of shape and motion , 2014, Comput. Vis. Image Underst..

[23]  Ehud Rivlin,et al.  Using Hierarchical Models for 3D Human Body-Part Tracking , 2009, SCIA.

[24]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[25]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[26]  Dimitrios Makris,et al.  G3D: A gaming action dataset and real time action recognition evaluation framework , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29]  Jean-Christophe Nebel,et al.  Efficient tracking of human poses using a manifold hierarchy , 2015, Comput. Vis. Image Underst..

[30]  Sébastien Hélie,et al.  Seeing is Worse than Believing: Reading People's Minds Better than Computer-Vision Methods Recognize Actions , 2014, ECCV.

[31]  Jock D. Mackinlay,et al.  The information visualizer, an information workspace , 1991, CHI.

[32]  Dimitrios Makris,et al.  G3Di: A Gaming Interaction Dataset with a Real Time Detection and Evaluation Framework , 2014, ECCV Workshops.

[33]  Yale Song,et al.  Action Recognition by Hierarchical Sequence Summarization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Wei Liang,et al.  Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[35]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.