D-STC: Deep learning with spatio-temporal constraints for train drivers detection from videos

Abstract Video-based train driver operation monitoring is one of the emerging requirements for train safety management and driving operation regularization. Recent years, deep learning methods such as Faster R-CNN have achieved excellent detection performance on images. However, they are not specially designed for object detection from videos, especially for those train drivers who often perform tiny moving in the monitoring video. Spatial and temporal information of videos are not fully explored together to solve this problem. In this paper, a new framework D-STC is proposed to handles the complex situations in train cab and detect train drivers from videos in a more reliable way. The proposed framework first utilizes fine tuning Faster R-CNN framework to detect the train drivers as the initial detection results. Then, the initial detection results of each frame is processed further to suppress false detection results by using the customized spatial constraints. Finally, an optimal threshold adjustment mechanism is presented to improve detection accuracy for the whole video sequence. The D-STC framework improves the accuracy of train driver detection and fully guarantees the detection speed for videos. Experimental results demonstrate the effectiveness of the proposed framework.

[1]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[2]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[4]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[5]  James M. Rehg,et al.  ${\rm C}^{4}$: A Real-Time Object Detection Framework , 2013, IEEE Transactions on Image Processing.

[6]  Yan-ping Chen,et al.  Fast hog feature computation based on CUDA , 2011, 2011 IEEE International Conference on Computer Science and Automation Engineering.

[7]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Junjie Yan,et al.  The Fastest Deformable Part Model for Object Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, CVPR.

[10]  Alessia Saggese,et al.  Dynamic Scene Understanding for Behavior Analysis Based on String Kernels , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Ian Reid,et al.  fastHOG – a real-time GPU implementation of HOG , 2011 .

[12]  Xiaogang Wang,et al.  Object Detection from Video Tubelets with Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[14]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Antonio Torralba,et al.  HOGgles: Visualizing Object Detection Features , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Yong Yu,et al.  Unsupervised Deep Domain Adaptation for Pedestrian Detection , 2016, ECCV Workshops.

[17]  Konrad Schindler,et al.  Towards Scene Understanding with Detailed 3D Object Representations , 2014, International Journal of Computer Vision.

[18]  Yu Zhou,et al.  Improved human head and shoulder detection with local main gradient and tracklets-based feature , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[19]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  David Zhang,et al.  Fast Tracking via Spatio-Temporal Context Learning , 2013, ArXiv.

[21]  Shinpei Kato,et al.  Accelerated Deformable Part Models on GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[22]  David A. Forsyth,et al.  30Hz Object Detection with DPM V5 , 2014, ECCV.

[23]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Rui Zhang,et al.  Contextual Object Detection With Spatial Context Prototypes , 2014, IEEE Transactions on Multimedia.

[28]  Bin Yang,et al.  CRAFT Objects from Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Cordelia Schmid,et al.  Spatio-temporal Object Detection Proposals , 2014, ECCV.

[31]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[32]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[33]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.