Bio-inspired predictive orientation decomposition of skeleton trajectories for real-time human activity prediction

Activity prediction is an essential task in practical human-centered robotics applications, such as security, assisted living, etc., which targets at inferring ongoing human activities based on incomplete observations. To address this challenging problem, we introduce a novel bio-inspired predictive orientation decomposition (BIPOD) approach to construct representations of people from 3D skeleton trajectories. Our approach is inspired by biological research in human anatomy. In order to capture spatio-temporal information of human motions, we spatially decompose 3D human skeleton trajectories and project them onto three anatomical planes (i.e., coronal, transverse and sagittal planes); then, we describe short-term time information of joint motions and encode high-order temporal dependencies. By estimating future skeleton trajectories that are not currently observed, we endow our BIPOD representation with the critical predictive capability. Empirical studies validate that our BIPOD approach obtains promising performance, in terms of accuracy and efficiency, using a physical TurtleBot2 robotic platform to recognize ongoing human activities. Experiments on benchmark datasets further demonstrate that our new BIPOD representation significantly outperforms previous approaches for real-time activity classification and prediction from 3D human skeleton trajectories.

[1]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[2]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[3]  Garry A. Einicke,et al.  Robust extended Kalman filtering , 1999, IEEE Trans. Signal Process..

[4]  Hedvig Kjellström,et al.  Audio-visual classification and detection of human manipulation actions , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Brion Benninger,et al.  Color Atlas of Anatomy: Photographic Study of the Human Body , 2008 .

[6]  Lynne E. Parker,et al.  Using on-line Conditional Random Fields to determine human intent for peer-to-peer human robot teaming , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Alois Knoll,et al.  Action recognition using ensemble weighted multi-instance learning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Alberto Del Bimbo,et al.  Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Lynne E. Parker,et al.  Real-Time Multiple Human Perception With Color-Depth Cameras on a Mobile Robot , 2013, IEEE Transactions on Cybernetics.

[10]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[11]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[12]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Gang Yu,et al.  Predicting human activities using spatio-temporal structure of interest points , 2012, ACM Multimedia.

[14]  Ryo Kurazume,et al.  Early Recognition and Prediction of Gestures , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Xiaochun Cao,et al.  Action Recognition Using Subtensor Constraint , 2012, ECCV.

[16]  Arif Mahmood,et al.  Real time action recognition using histograms of depth gradients and random decision forests , 2014, IEEE Winter Conference on Applications of Computer Vision.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Johannes W. Rohen,et al.  Color atlas of anatomy : a photographic study of the human body / Johannes W. Rohen, Chihiro Yokochi ; with the collaboration of Lynn J. Romrell , 1983 .

[19]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Luc Van Gool,et al.  Metric Learning from Poses for Temporal Clustering of Human Motion , 2012, BMVC.

[22]  Lynne E. Parker,et al.  4-dimensional local spatio-temporal features for human activity recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[24]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Fernando De la Torre,et al.  Max-Margin Early Event Detectors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  J. Laidlaw,et al.  ANATOMY OF THE HUMAN BODY , 1967, The Ulster Medical Journal.

[29]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[31]  Marwan Torki,et al.  Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition , 2013, IJCAI.