Semantic Segmentation of Railway Images Considering Temporal Continuity

In this paper, we focus on the semantic segmentation of images taken from a camera mounted on the front end of trains for measuring and managing rail-side facilities. Improving the efficiency and perhaps automating such tasks are crucial as they are currently done manually. We aim to realize this by capturing information about the railway environment through the semantic segmentation of train front-view camera images. Specifically, assuming that the lateral movement of trains are smooth, we propose a method to use information from multiple frames to consider temporal continuity during semantic segmentation. Based on the densely estimated optical flow between sequential frames, the weighted mean of class likelihoods of corresponding pixels of the focused frame are calculated. We also construct a new dataset consisting of train front-view camera images and its annotations for semantic segmentation. The proposed method outperforms a conventional single-frame semantic segmentation model, and the use of class likelihoods for the frame combination also proved effective.

[1]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[2]  Rama Chellappa,et al.  Material classification and semantic segmentation of railway track images with deep convolutional neural networks , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[3]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[4]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Monitoring of railway structures by MMS , 2016 .

[6]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[8]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).