Fusing Object Information and Inertial Data for Activity Recognition †

In the field of pervasive computing, wearable devices have been widely used for recognizing human activities. One important area in this research is the recognition of activities of daily living where especially inertial sensors and interaction sensors (like RFID tags with scanners) are popular choices as data sources. Using interaction sensors, however, has one drawback: they may not differentiate between proper interaction and simple touching of an object. A positive signal from an interaction sensor is not necessarily caused by a performed activity e.g., when an object is only touched but no interaction occurred afterwards. There are, however, many scenarios like medicine intake that rely heavily on correctly recognized activities. In our work, we aim to address this limitation and present a multimodal egocentric-based activity recognition approach. Our solution relies on object detection that recognizes activity-critical objects in a frame. As it is infeasible to always expect a high quality camera view, we enrich the vision features with inertial sensor data that monitors the users’ arm movement. This way we try to overcome the drawbacks of each respective sensor. We present our results of combining inertial and video features to recognize human activities on different types of scenarios where we achieve an F1-measure of up to 79.6%.

[1]  Laurent Itti,et al.  Situation awareness via sensor-equipped eyeglasses , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Thomas Kirste,et al.  Providing Semantic Annotation for the CMU Grand Challenge Dataset , 2018, 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[3]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  C. Atkeson,et al.  Toward the Automatic Assessment of Behavioral Distrubances of Dementia , 2003 .

[5]  Massimo Piccardi,et al.  Structural SVM with Partial Ranking for Activity Segmentation and Classification , 2015, IEEE Signal Processing Letters.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Dieter Fox,et al.  Fine-grained kitchen activity recognition using RGB-D , 2012, UbiComp.

[8]  Dima Damen,et al.  Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.

[9]  Timo Sztyler,et al.  Vision and Acceleration Modalities: Partners for Recognizing Complex Activities , 2019, 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[10]  M. Lawton,et al.  Assessment of older people: self-maintaining and instrumental activities of daily living. , 1969, The Gerontologist.

[11]  Marco Gamba,et al.  BORIS: a free, versatile open‐source event‐logging software for video/audio coding and live observations , 2016 .

[12]  David Howard,et al.  A Comparison of Feature Extraction Methods for the Classification of Dynamic Activities From Accelerometer Data , 2009, IEEE Transactions on Biomedical Engineering.

[13]  Timo Sztyler,et al.  Hips Do Lie! A Position-Aware Mobile Fall Detection System , 2018, 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[14]  Andrea Cavallaro,et al.  Hierarchical modeling for first-person vision activity recognition , 2017, Neurocomputing.

[15]  Jessica K. Hodgins,et al.  Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .

[16]  Gary M. Weiss,et al.  Smartwatch-based activity recognition: A machine learning approach , 2016, 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[17]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[19]  Christopher G. Atkeson,et al.  Assistive intelligent environments for automatic health monitoring , 2005 .

[20]  Daqing Zhang,et al.  RT-Fall: A Real-Time and Contactless Fall Detection System with Commodity WiFi Devices , 2017, IEEE Transactions on Mobile Computing.

[21]  Fernando Fernández Martínez,et al.  Feature extraction from smartphone inertial signals for human activity segmentation , 2016, Signal Process..

[22]  Takuya Maekawa,et al.  Object-Based Activity Recognition with Heterogeneous Sensors on Wrist , 2010, Pervasive.

[23]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[24]  Timo Sztyler,et al.  Exploring Semi-Supervised Methods for Labeling Support in Multimodal Datasets , 2018, Sensors.

[25]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Matthias Rauterberg,et al.  The Evolution of First Person Vision Methods: A Survey , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Timo Sztyler,et al.  On-body localization of wearable devices: An investigation of position-aware activity recognition , 2016, 2016 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[28]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.

[29]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jie Lin,et al.  Egocentric activity recognition with multimodal fisher vector , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Seungmin Rho,et al.  Physical activity recognition using multiple sensors embedded in a wearable device , 2013, TECS.

[32]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[33]  Joo-Hwee Lim,et al.  Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Miguel A. Labrador,et al.  Survey on Fall Detection and Fall Prevention Using Wearable and External Sensors , 2014, Sensors.

[35]  Timo Sztyler,et al.  Recognizing Grabbing Actions from Inertial and Video Sensor Data in a Warehouse Scenario , 2017, FNC/MobiSPC.

[36]  Markus Vincze,et al.  Multi-rate fusion with vision and inertial sensors , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[37]  Amit Kumar,et al.  Combining off-the-shelf Image Classifiers with Transfer Learning for Activity Recognition , 2018, iWOAR.

[38]  Susumu Tachi,et al.  Pervasive Sensor System for Evidence-based Nursing Care Support , 2006, ICRA.

[39]  Jean-Christophe Nebel,et al.  Recognition of Activities of Daily Living with Egocentric Vision: A Review , 2016, Sensors.

[40]  Gaurav S. Sukhatme,et al.  Visual-Inertial Sensor Fusion: Localization, Mapping and Sensor-to-Sensor Self-calibration , 2011, Int. J. Robotics Res..

[41]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[43]  Timo Sztyler,et al.  Unsupervised recognition of interleaved activities of daily living through ontological and probabilistic reasoning , 2016, UbiComp.