Weakly-Labelled Semantic Segmentation of Fish Objects in Underwater Videos Using a Deep Residual Network

We propose the use of a 152-layer Fully Convolutional Residual Network (ResNet-FCN) for non motion-based semantic segmentation of fish objects in underwater videos that is robust to varying backgrounds and changes in illumination. For supervised training, we use weakly-labelled ground truth derived from motion-based adaptive Mixture of Gaussians Background Subtraction. Segmentation results of videos taken from six different sites at a benthic depth of around 10 m using ResNet-FCN provide a fish object average precision of 65.91%, and average recall of 83.99%. The network is able to correctly segment fish objects solely through color-based input features, without need for motion cues, and it could detect fish objects even in frames that have strong changes in illumination due to wave motion at the sea surface. It can segment fish objects that are located far from the camera despite varying benthic background appearance and differences in aquatic hues.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[3]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[4]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  X. Cufi,et al.  On the way to solve lighting problems in underwater imaging , 2002, OCEANS '02 MTS/IEEE.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.