Vision-based action recognition of construction workers using dense trajectories

Wide spread monitoring cameras on construction sites provide large amount of information for construction management. The emerging of computer vision and machine learning technologies enables automated recognition of construction activities from videos. As the executors of construction, the activities of construction workers have strong impact on productivity and progress. Compared to machine work, manual work is more subjective and may differ largely in operation flow and productivity among different individuals. Hence only a handful of work studies on vision based action recognition of construction workers. Lacking of publicly available datasets is one of the main reasons that currently hinder advancement. The paper studies worker actions comprehensively, abstracts 11 common types of actions from 5 kinds of trades and establishes a new real world video dataset with 1176 instances. For action recognition, a cutting-edge video description method, dense trajectories, has been applied. Support vector machines are integrated with a bag-of-features pipeline for action learning and classification. Performances on multiple types of descriptors (Histograms of Oriented Gradients - HOG, Histograms of Optical Flow - HOF, Motion Boundary Histogram - MBH) and their combination have been evaluated. Discussion on different parameter settings and comparison to the state-of-the-art method are provided. Experimental results show that the system with codebook size 500 and MBH descriptor has achieved an average accuracy of 59% for worker action recognition, outperforming the state-of-the-art result by 24%.

[1]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Man-Woo Park,et al.  Automated 3D vision-based tracking of construction entities , 2012 .

[4]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[6]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[7]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[8]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Nassir Navab,et al.  On-line Recognition of Surgical Activity for Monitoring in the Operating Room , 2008, AAAI.

[10]  Cees Snoek,et al.  What do 15,000 object categories tell us about classifying and localizing actions? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Paul M. Goodrum,et al.  Activity Analysis for Direct-Work Rate Improvement in Construction , 2011 .

[12]  Jie Gong,et al.  An intelligent video computing method for automated productivity analysis of cyclic construction operations , 2009 .

[13]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[14]  Hyoungkwan Kim,et al.  Using Hue, Saturation, and Value Color Space for Hydraulic Excavator Idle Time Analysis , 2007 .

[15]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Carlos H. Caldas,et al.  Learning and classifying actions of construction workers and equipment using Bag-of-Video-Feature-Words and Bayesian network models , 2011, Adv. Eng. Informatics.

[17]  Jochen Teizer,et al.  Status quo and open challenges in vision-based sensing and tracking of temporary resources on infrastructure construction sites , 2015, Adv. Eng. Informatics.

[18]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Jie Gong,et al.  An object recognition, tracking, and contextual reasoning-based video interpretation method for rapid productivity analysis of construction operations , 2011 .

[20]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Thomas Serre,et al.  The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Abhinav Peddi Development of human pose analyzing algorithms for the determination of construction productivity in real-time , 2009 .

[24]  Jessica K. Hodgins,et al.  Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database , 2008 .

[25]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Stephen J. McKenna,et al.  Combining embedded accelerometers with computer vision for recognizing food preparation activities , 2013, UbiComp.

[27]  Sven J. Dickinson,et al.  Server-Customer Interaction Tracker: Computer Vision-Based System to Estimate Dirt-Loading Cycles , 2013 .

[28]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[29]  Zhongke Shi,et al.  Automatic Recognition of Construction Worker Activities Using Dense Trajectories , 2015 .

[30]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[31]  Christian Koch,et al.  Three-Dimensional Tracking of Construction Resources Using an On-Site Camera System , 2012, J. Comput. Civ. Eng..

[32]  Mani Golparvar-Fard,et al.  Vision-based workface assessment using depth images for activity analysis of interior construction operations , 2014 .

[33]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  John E. Hummel Object Recognition , 2014, Computer Vision, A Reference Guide.

[35]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[36]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[37]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[38]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[39]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[40]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[41]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[42]  Theodora A. Varvarigou,et al.  A dataset for workflow recognition in industrial scenes , 2011, 2011 18th IEEE International Conference on Image Processing.

[43]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[44]  Juan Carlos Niebles,et al.  Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers , 2013, Adv. Eng. Informatics.

[45]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Zhongke Shi,et al.  Vision-Based Tower Crane Tracking for Understanding Construction Activity , 2014, J. Comput. Civ. Eng..

[47]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[49]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  J. Crowley,et al.  CAVIAR Context Aware Vision using Image-based Active Recognition , 2005 .

[51]  Mostafa E. Shehata,et al.  Towards improving construction labor productivity and projects’ performance , 2011 .

[52]  SangHyun Lee,et al.  Computer vision techniques for construction safety and health monitoring , 2015, Adv. Eng. Informatics.

[53]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[54]  Carlos H. Caldas,et al.  Vision-based action recognition in the internal construction site using interactions between worker actions and construction objects , 2013 .

[55]  Patricio A. Vela,et al.  Construction performance monitoring via still images, time-lapse photos, and video streams: Now, tomorrow, and the future , 2015, Adv. Eng. Informatics.

[56]  Tao Cheng,et al.  Automated task-level activity analysis through fusion of real time location sensors and worker's tho , 2013 .

[57]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[58]  Patricio A. Vela,et al.  A Comprehensive Methodology for Vision-Based Progress and Activity Estimation of Excavation Processes for Productivity Assessment , 2014 .

[59]  Frédéric Bosché,et al.  Toward Automated Earned Value Tracking Using 3D Imaging Tools , 2013 .

[60]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[61]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[62]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[63]  Jie Gong,et al.  Computer Vision-Based Video Interpretation Model for Automated Productivity Analysis of Construction Operations , 2010 .

[64]  Ayman Habib,et al.  Application of Microsoft Kinect Sensor for Tracking Construction Workers , 2012 .