Exploring Semi-Supervised Methods for Labeling Support in Multimodal Datasets

Working with multimodal datasets is a challenging task as it requires annotations which often are time consuming and difficult to acquire. This includes in particular video recordings which often need to be watched as a whole before they can be labeled. Additionally, other modalities like acceleration data are often recorded alongside a video. For that purpose, we created an annotation tool that enables to annotate datasets of video and inertial sensor data. In contrast to most existing approaches, we focus on semi-supervised labeling support to infer labels for the whole dataset. This means, after labeling a small set of instances our system is able to provide labeling recommendations. We aim to rely on the acceleration data of a wrist-worn sensor to support the labeling of a video recording. For that purpose, we apply template matching to identify time intervals of certain activities. We test our approach on three datasets, one containing warehouse picking activities, one consisting of activities of daily living and one about meal preparations. Our results show that the presented method is able to give hints to annotators about possible label candidates.

[1]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Blaine A. Price,et al.  Wearables: has the age of smartwatches finally arrived? , 2015, Commun. ACM.

[3]  Kees Jan Roodbergen,et al.  Design and control of warehouse order picking: A literature review , 2006, Eur. J. Oper. Res..

[4]  S Szewcyzk,et al.  Annotating smart environment sensor data for activity learning. , 2009, Technology and health care : official journal of the European Society for Engineering and Medicine.

[5]  Marco Gamba,et al.  BORIS: a free, versatile open‐source event‐logging software for video/audio coding and live observations , 2016 .

[6]  Anna M. Bianchi,et al.  User-Independent Recognition of Sports Activities From a Single Wrist-Worn Accelerometer: A Template-Matching-Based Approach , 2016, IEEE Transactions on Biomedical Engineering.

[7]  Timo Sztyler,et al.  Exploring a multi-sensor picking process in the future warehouse , 2016, UbiComp Adjunct.

[8]  Edward H. Adelson,et al.  Human-assisted motion annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jessica K. Hodgins,et al.  Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .

[10]  Meinard Mller,et al.  Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications , 2015 .

[11]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[12]  Timo Sztyler,et al.  A smart data annotation tool for multi-sensor activity recognition , 2017, 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[13]  Daniel Sonntag,et al.  Multimodal multisensor activity annotation tool , 2016, UbiComp Adjunct.

[14]  Tiziana D'Orazio,et al.  A Semi-automatic System for Ground Truth Generation of Soccer Video Sequences , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[15]  László Böszörményi,et al.  AAU Video Browser: Non-Sequential Hierarchical Video Browsing without Content Analysis , 2012, MMM.

[16]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[17]  Tarik Arici,et al.  Gesture Recognition using Skeleton Data with Weighted Dynamic Time Warping , 2013, VISAPP.

[18]  László Böszörményi,et al.  Smart Video Browsing with Augmented Navigation Bars , 2013, MMM.

[19]  M. Lawton,et al.  Assessment of older people: self-maintaining and instrumental activities of daily living. , 1969, The Gerontologist.

[20]  Hironobu Takagi,et al.  Recognizing hand-object interactions in wearable camera videos , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[21]  Florian Hönig,et al.  Smart Annotation of Cyclic Data Using Hierarchical Hidden Markov Models , 2017, Sensors.

[22]  Elena Mugellini,et al.  A Smart Watch with Embedded Sensors to Recognize Objects, Grasps and Forearm Gestures , 2012 .

[23]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[24]  Daniel Sonntag,et al.  LabelMovie: Semi-supervised machine annotation tool with quality assurance and crowd-sourcing options for videos , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[25]  Timo Sztyler,et al.  On-body localization of wearable devices: An investigation of position-aware activity recognition , 2016, 2016 IEEE International Conference on Pervasive Computing and Communications (PerCom).