Human Action Recognition through the First-Person Point of view, Case Study Two Basic Task

In this study, a human motion dataset is built and developed based on indoors and outdoors actions through a bounded-onhead camera and Xsens for tracking the motions. The key point here to structuring the dataset is utilized to set the sequence of a Deep Neural Network and order an arrangement of frames in the performed task (washing, eating, etc.). As a final point, a 3D modeling of the person suggested at every frame centered with the comparable structure of the first network. More than 120,000 frames constructed the dataset, taken from 7 different people, each one acting out different tasks in diverse indoor and outdoor scenarios. The sequences of every video frame were 3D synchronized and segmented 23 parts.

[1]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Wenjun Zeng,et al.  Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks , 2016, ECCV.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Danica Kragic,et al.  Learning Task Models from Multiple Human Demonstrations , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[6]  Kris M. Kitani,et al.  3D Ego-Pose Estimation via Imitation Learning , 2018, ECCV.

[7]  Mossaab Hariz,et al.  Vision-based Recognition of Activities by a Humanoid Robot , 2015 .

[8]  Kristen Grauman,et al.  Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Reza Safabakhsh,et al.  Correlational Convolutional LSTM for human action recognition , 2020, Neurocomputing.

[10]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Wanggen Wan,et al.  Human Pose Estimation Based on Deep Neural Network , 2018, 2018 International Conference on Audio, Language and Image Processing (ICALIP).

[12]  Francesco Piazza,et al.  Preprocessing based solution for the vanishing gradient problem in recurrent neural networks , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Shuang Wang,et al.  Skeleton-based action recognition using LSTM and CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[15]  Simone Calderara,et al.  Understanding social relationships in egocentric vision , 2015, Pattern Recognit..

[16]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[17]  Saurabh Upadhyay,et al.  Optical Flow Measurement using Lucas Kanade Method , 2013 .