Robust multi-dimensional motion features for first-person vision activity recognition

We design a set of multi-dimensional motion features from first-person video.We extract virtual inertial data from video only.We combine motion magnitude, direction and dynamics with virtual inertial data.The features are independent of the classifier and validated on multiple datasets.Two new datasets are made available to the research community. We propose robust multi-dimensional motion features for human activity recognition from first-person videos. The proposed features encode information about motion magnitude, direction and variation, and combine them with virtual inertial data generated from the video itself. The use of grid flow representation, per-frame normalization and temporal feature accumulation enhances the robustness of our new representation. Results on multiple datasets demonstrate that the proposed feature representation outperforms existing motion features, and importantly it does so independently of the classifier. Moreover, the proposed multi-dimensional motion features are general enough to make them suitable for vision tasks beyond those related to wearable cameras.

[1]  Noel E. O'Connor,et al.  Investigating older and younger peoples' motivations for lifelogging with wearable cameras , 2013, 2013 IEEE International Symposium on Technology and Society (ISTAS): Social Implications of Wearable Computing and Augmediated Reality in Everyday Life.

[2]  Seungmin Rho,et al.  Physical activity recognition using multiple sensors embedded in a wearable device , 2013, TECS.

[3]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ryo Kurazume,et al.  First-Person Animal Activity Recognition from Egocentric Videos , 2014, 2014 22nd International Conference on Pattern Recognition.

[5]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[7]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[8]  Fabio Tozeto Ramos,et al.  Multi-scale Conditional Random Fields for first-person activity recognition , 2014, 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[9]  Alex Pentland,et al.  Recognizing user context via wearable sensors , 2000, Digest of Papers. Fourth International Symposium on Wearable Computers.

[10]  Andrew Zisserman,et al.  Feature Based Methods for Structure and Motion Estimation , 1999, Workshop on Vision Algorithms.

[11]  Sven Bambach A Survey on Recent Advances of Computer Vision Algorithms for Egocentric Video , 2015, ArXiv.

[12]  Michael L. Littman,et al.  Activity Recognition from Accelerometer Data , 2005, AAAI.

[13]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Sei Naito,et al.  An Attention-Based Activity Recognition for Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Yasuo Ariki,et al.  Video shooting navigation system by real-time useful shot discrimination based on video grammar , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[16]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[17]  Fabio Ramos,et al.  Multi-scale Conditional Random Fields for first-person activity recognition on elders and disabled patients , 2015 .

[18]  Thomas Kieninger,et al.  Gaze guided object recognition using a head-mounted eye tracker , 2012, ETRA '12.

[19]  Kenji Mase,et al.  Activity and Location Recognition Using Wearable Sensors , 2002, IEEE Pervasive Comput..

[20]  Zhenyu He,et al.  Activity recognition from acceleration data based on discrete consine transform and SVM , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[21]  Yoo-Joo Choi,et al.  SmartBuckle: human activity recognition using a 3-axis accelerometer and a wearable camera , 2008, HealthNet '08.

[22]  Joan Cabestany,et al.  SVM-based posture identification with a single waist-located triaxial accelerometer , 2013, Expert Syst. Appl..

[23]  Larry H. Matthies,et al.  First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Nigel H. Lovell,et al.  Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring , 2006, IEEE Transactions on Information Technology in Biomedicine.

[25]  Mingui Sun,et al.  Recognizing physical activity from ego-motion of a camera , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[26]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[27]  Daniel P. Siewiorek,et al.  Activity recognition and monitoring using multiple sensors on different body positions , 2006, International Workshop on Wearable and Implantable Body Sensor Networks (BSN'06).

[28]  Paul L. Rosin Measuring Corner Properties , 1999, Comput. Vis. Image Underst..

[29]  Mingui Sun,et al.  Physical activity recognition based on motion in images acquired by a wearable camera , 2011, Neurocomputing.

[30]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[31]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[32]  Mohan S. Kankanhalli,et al.  Action and Interaction Recognition in First-Person Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[33]  Steve Hodges,et al.  SenseCam: A wearable camera that stimulates and rehabilitates autobiographical memory , 2011, Memory.

[34]  Eliathamby Ambikairajah,et al.  Classification of a known sequence of motions and postures from accelerometry data using adapted Gaussian mixture models. , 2006, Physiological measurement.

[35]  Walterio W. Mayol-Cuevas,et al.  High level activity recognition using low resolution wearable vision , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Angelo M. Sabatini,et al.  Machine Learning Methods for Classifying Human Physical Activity from On-Body Accelerometers , 2010, Sensors.

[37]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[38]  Akio Nagasaka,et al.  Real-Time Video Mosaics Using Luminance-Projection Correlation , 1999 .

[39]  Alan F. Smeaton,et al.  Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[40]  Aude Billard,et al.  WearCam: A head mounted wireless camera for monitoring gaze attention and for the diagnosis of developmental disorders in young children , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[41]  Yaser Sheikh,et al.  Motion capture from body-mounted cameras , 2011, SIGGRAPH 2011.

[42]  Luc Cluitmans,et al.  Advancing from offline to online activity recognition with wearable sensors , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[43]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[44]  Paul Lukowicz,et al.  Recognizing Workshop Activity Using Body Worn Microphones and Accelerometers , 2004, Pervasive.

[45]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[46]  Fabio Tozeto Ramos,et al.  Activity recognition from a wearable camera , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[47]  Cristina Conati,et al.  Understanding Student Attention to Adaptive Hints with Eye-Tracking , 2011, UMAP Workshops.

[48]  Tal Garfinkel,et al.  Reducing shoulder-surfing by using gaze-based password entry , 2007, SOUPS '07.

[49]  Miguel A. Labrador,et al.  Centinela: A human activity recognition system based on acceleration and vital sign data , 2012, Pervasive Mob. Comput..

[50]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[51]  Jie Li,et al.  Designing a wearable computer for lifestyle evaluation , 2012, 2012 38th Annual Northeast Bioengineering Conference (NEBEC).

[52]  Pedro M. Q. Aguiar,et al.  Global Motion Estimation: Feature-Based, Featureless, or Both ?! , 2006, ICIAR.

[53]  P. Anandan,et al.  About Direct Methods , 1999, Workshop on Vision Algorithms.

[54]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.