Generation of Action Recognition Training Data Through Rotoscoping and Augmentation of Synthetic Animations

In this paper, we present a method to synthetically generate the training material needed by machine learning algorithms to perform human action recognition from 2D videos. As a baseline pipeline, we consider a 2D video stream passing through a skeleton extractor (OpenPose), whose 2D joint coordinates are analyzed by a random forest. Such a pipeline is trained and tested using real live videos. As an alternative approach, we propose to train the random forest using automatically generated 3D synthetic videos. For each action, given a single reference live video, we edit a 3D animation (in Blender) using the rotoscoping technique. This prior animation is then used to produce a full training set of synthetic videos via perturbation of the original animation curves. Our tests, performed on live videos, show that our alternative pipeline leads to comparable accuracy, with the advantage of drastically reducing both the human effort and the computing power needed to produce the live training material.

[1]  Hirohiko Suwa,et al.  The influence of measurements and feature types in automatic micro-behavior recognition in meal preparation , 2018, IEEE Instrumentation & Measurement Magazine.

[2]  Mark S. Nixon,et al.  Gender Classification in Human Gait Using Support Vector Machine , 2005, ACIVS.

[3]  Wen-Nung Lie,et al.  Human fall-down event detection based on 2D skeletons and deep learning approach , 2018, 2018 International Workshop on Advanced Image Technology (IWAIT).

[4]  Marco Tarabini,et al.  Accuracy of the Microsoft Kinect System in the Identification of the Body Posture , 2016, MobiHealth.

[5]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[6]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[7]  Dimitrios Makris,et al.  G3D: A gaming action dataset and real time action recognition evaluation framework , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Tom Ziemke,et al.  Action and intention recognition in human interaction with autonomous vehicles , 2015 .

[10]  David Vázquez,et al.  Learning appearance in virtual scenarios for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Mathias Schmitt,et al.  Human-machine-interaction in the industry 4.0 era , 2014, 2014 12th IEEE International Conference on Industrial Informatics (INDIN).

[12]  Hirohiko Suwa,et al.  Kinect-Based Micro-Behavior Sensing System for Learning the Smart Assistance with Human Subjects Inside Their Homes , 2018, 2018 Workshop on Metrology for Industry 4.0 and IoT.

[13]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[14]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[15]  Petros Daras,et al.  Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data , 2014, MMM.

[16]  Mariolino De Cecco,et al.  An Augmented Reality Virtual Assistant to Help Mild Cognitive Impaired Users in Cooking a System Able to Recognize the User Status and Personalize the Support , 2018, 2018 Workshop on Metrology for Industry 4.0 and IoT.

[17]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[18]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Keiichi Yasumoto,et al.  Sigma-z random forest, classification and confidence , 2018, Measurement Science and Technology.