Action Recognition Using Local Visual Descriptors and Inertial Data

Different body sensors and modalities can be used in human action recognition, either separately or simultaneously. Multi-modal data can be used in recognizing human action. In this work we are using inertial measurement units (IMUs) positioned at left and right hands with first person vision for human action recognition. A novel statistical feature extraction method was proposed based on curvature of the graph of a function and tracking left and right hand positions in space. Local visual descriptors have been used as features for egocentric vision. An intermediate fusion between IMUs and visual sensors has been performed. Despite of using only two IMUs sensors with egocentric vision, our classification result achieved is 99.61% for recognizing nine different actions. Feature extraction step could play a vital step in human action recognition with limited number of sensors, hence, our method might indeed be promising.

[1]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Chalavadi Krishna Mohan,et al.  Human action recognition in RGB-D videos using motion sequence information and deep learning , 2017, Pattern Recognit..

[3]  Nicu Sebe,et al.  Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off , 2015, International Journal of Multimedia Information Retrieval.

[4]  Sergio Escalera,et al.  LSTA: Long Short-Term Attention for Egocentric Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Taha Alhersh,et al.  Unsupervised Fine-tuning of Optical Flow for Better Motion Boundary Estimation , 2019, VISIGRAPP.

[6]  Lin Tan,et al.  Sequential Human Activity Recognition Based on Deep Convolutional Network and Extreme Learning Machine Using Wearable Sensors , 2018, J. Sensors.

[7]  Belur V. Dasarathy,et al.  Sensor fusion potential exploitation-innovative architectures and illustrative applications , 1997, Proc. IEEE.

[8]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[9]  Senem Velipasalar,et al.  HUMAN ACTIVITY CLASSIFICATION INCORPORATING EGOCENTRIC VIDEO AND INERTIAL MEASUREMENT UNIT DATA , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[10]  C. V. Jawahar,et al.  First Person Action Recognition Using Deep Learned Descriptors , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Guy Carrault,et al.  Activity Recognition Using Complex Network Analysis , 2018, IEEE Journal of Biomedical and Health Informatics.

[12]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[13]  Marek B. Zaremba,et al.  Wearable Sensor Data Classification for Human Activity Recognition Based on an Iterative Learning Framework † , 2017, Sensors.

[14]  S. Santhosh Kumar,et al.  Human activity recognition using optical flow based feature set , 2016, 2016 IEEE International Carnahan Conference on Security Technology (ICCST).

[15]  Walid Gomaa,et al.  An LSTM-based Descriptor for Human Activities Recognition using IMU Sensors. , 2018 .

[16]  Jessica K. Hodgins,et al.  Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Kristina Yordanova,et al.  Creating and Exploring Semantic Annotation for Behaviour Analysis , 2018, Sensors.

[19]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[20]  Wei Zhang,et al.  Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Cheng Xu,et al.  InnoHAR: A Deep Neural Network for Complex Human Activity Recognition , 2019, IEEE Access.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Ferda Nur Alpaslan,et al.  Optical flow-based representation for video action detection , 2015, CVPR 2015.

[24]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[25]  Anne S. Wannenwetsch,et al.  ProbFlow: Joint Optical Flow and Uncertainty Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Jean-Christophe Nebel,et al.  Recognition of Activities of Daily Living with Egocentric Vision: A Review , 2016, Sensors.

[27]  Patrick Bouthemy,et al.  Optical flow modeling and computation: A survey , 2015, Comput. Vis. Image Underst..

[28]  Nassir Navab,et al.  Human Motion Analysis with Deep Metric Learning , 2018, ECCV.

[29]  Andrea Cavallaro,et al.  Inertial-Vision: Cross-Domain Knowledge Transfer for Wearable Sensors , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[30]  Dirk Krechel,et al.  Human Action Recognition Using Optical Flow and Convolutional Neural Networks , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[31]  M. Tahar Kechadi,et al.  Human Activity Recognition with Convolutional Neural Networks , 2018, ECML/PKDD.

[32]  Taha Alhersh,et al.  On the Combination of IMU and Optical Flow for Action Recognition , 2019, 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[33]  Nicu Sebe,et al.  Realtime Video Classification using Dense HOF/HOG , 2014, ICMR.

[34]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Nasser Kehtarnavaz,et al.  A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[36]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Peter Jancovic,et al.  Multi-modal egocentric activity recognition using multi-kernel learning , 2018, Multimedia Tools and Applications.

[38]  Faicel Chamroukhi,et al.  Physical Human Activity Recognition Using Wearable Sensors , 2015, Sensors.

[39]  Stephen J. McKenna,et al.  Computer Vision and Image Understanding Recognising Complex Activities with Histograms of Relative Tracklets , 2022 .

[40]  Matthias Rauterberg,et al.  The Evolution of First Person Vision Methods: A Survey , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Gernot A. Fink,et al.  Convolutional Neural Networks for Human Activity Recognition Using Body-Worn Sensors , 2018, Informatics.

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Yi Yang,et al.  Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019 , 2019, ArXiv.

[44]  Rogelio Lozano,et al.  FUSION OF OPTICAL FLOW AND INERTIAL SENSORS FOR FOUR-ROTOR ROTORCRAFT STABILIZATION , 2007 .