Multimodal Trip Hazard Affordance Detection on Construction Sites

Trip hazards are a significant contributor to accidents on construction and manufacturing sites. Current safety inspections are labor intensive and limited by human fallibility, making automation of trip hazard detection appealing from both a safety and economic perspective. Trip hazards present an interesting challenge to modern learning techniques because they are defined as much by affordance as by object type, for example, wires on a table are not a trip hazard, but can be if lying on the ground. To address these challenges, we conduct a comprehensive investigation into the performance characteristics of 11 different colors and depth fusion approaches, including four fusion and one nonfusion approach, using color and two types of depth images. Trained and tested on more than 600 labeled trip hazards over four floors and 2000 m2 in an active construction site, this approach was able to differentiate between identical objects in different physical configurations. Outperforming a color-only detector, our multimodal trip detector fuses color and depth information to achieve a 4% absolute improvement in F1-score. These investigative results and the extensive publicly available dataset move us one step closer to assistive or fully automated safety inspection systems on construction sites.

[1]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[2]  三嶋 博之 The theory of affordances , 2008 .

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[5]  Fei-Fei Li,et al.  Discovering Object Functionality , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Robert Platt,et al.  Localizing Grasp Affordances in 3-D Points Clouds Using Taubin Quadric Fitting , 2013, ArXiv.

[8]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[9]  Bärbel Mertsching,et al.  Integrating Object Affordances with Artificial Visual Attention , 2014, ECCV Workshops.

[10]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Luc De Raedt,et al.  Occluded object search by relational affordances , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Alexandre Bernardino,et al.  Learning visual affordances of objects and tools through autonomous robot exploration , 2014, 2014 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC).

[14]  Alexandre Bernardino,et al.  Learning intermediate object affordances: Towards the development of a tool concept , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[15]  Ashutosh Saxena,et al.  Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception , 2014, Robotics: Science and Systems.

[16]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Niko Sünderhauf,et al.  How Good Are Edge Boxes, Really? , 2015 .

[18]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[19]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Rada Mihalcea,et al.  Mining semantic affordances of visual object categories , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Abhinav Gupta,et al.  In Defense of the Direct Perception of Affordances , 2015, ArXiv.

[23]  Mahmudul Hassan,et al.  Attribute Based Affordance Detection from Human-Object Interaction Images , 2015, PSIVT Workshops.

[24]  Michael Milford,et al.  TripNet: Detecting trip hazards on construction sites , 2015, ICRA 2015.

[25]  Huaqing Min,et al.  Affordance matching from the shared information in multi-robot , 2015, 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[26]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Giorgio Metta,et al.  Self-supervised learning of grasp dependent tool affordances on the iCub Humanoid robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Masayuki Inaba,et al.  Characterization of handover orientations used by humans for efficient robot to human handovers , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Feng Wu,et al.  Deeply Exploit Depth Information for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Conrad Sanderson,et al.  Fine-grained classification via mixture of deep convolutional neural networks , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Wolfram Burgard,et al.  Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion , 2016, ISER.

[36]  Kaleem Siddiqi,et al.  Differential Geometry Boosts Convolutional Neural Networks for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[38]  Wolfram Burgard,et al.  Efficient deep models for monocular road segmentation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Yunchao Wei,et al.  Proposal-Free Network for Instance-Level Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.