Semantic Segmentation of Underwater Imagery: Dataset and Benchmark

In this paper, we present the first large-scale dataset for semantic Segmentation of Underwater IMagery (SUIM). It contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants. We also present a benchmark evaluation of state-of-the-art semantic segmentation approaches based on standard performance metrics. In addition, we present SUIM-Net, a fully-convolutional encoder-decoder model that balances the trade-off between performance and computational efficiency. It offers competitive performance while ensuring fast end-to-end inference, which is essential for its use in the autonomy pipeline of visually-guided underwater robots. In particular, we demonstrate its usability benefits for visual servoing, saliency prediction, and detailed scene understanding. With a variety of use cases, the proposed model and benchmark dataset open up promising opportunities for future research in underwater robot vision.

[1]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Zhengmao Ye Objective Assessment of Nonlinear Segmentation Approaches to Gray Level Underwater Images , 2009 .

[4]  David J. Kriegman,et al.  Automated annotation of coral reef survey images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Daniel Cagara,et al.  Improving Underwater Obstacle Detection using Semantic Image Segmentation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Faisal Shafait,et al.  Automated Fish Detection in Underwater Images Using Shape‐Based Level Sets , 2015 .

[8]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Luz Abril Torres-Méndez,et al.  Robotic Visual Tracking of Relevant Cues in Underwater Environments with Poor Visibility Conditions , 2016, J. Sensors.

[10]  Tao Mei,et al.  Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hanumant Singh,et al.  Visual summaries for low-bandwidth semantic mapping with autonomous underwater vehicles , 2014, 2014 IEEE/OES Autonomous Underwater Vehicles (AUV).

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Md Jahidul Islam,et al.  Fast Underwater Image Enhancement for Improved Visual Perception , 2020, IEEE Robotics and Automation Letters.

[15]  Samee Ullah Khan,et al.  MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation , 2016, Future Gener. Comput. Syst..

[16]  Sinisa Todorovic,et al.  A Multi-scale CNN for Affordance Segmentation in RGB Images , 2016, ECCV.

[17]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[18]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[19]  Hanumant Singh,et al.  Robotic tools for deep water archaeology: Surveying an ancient shipwreck with an autonomous underwater vehicle , 2010, J. Field Robotics.

[20]  Gregory Dudek,et al.  Underwater multi-robot convoying using visual tracking by detection , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[22]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Md Jahidul Islam,et al.  Mixed-domain biological motion tracking for underwater human-robot interaction , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[25]  Baihua Li,et al.  Underwater scene segmentation by deep neural network , 2019 .

[26]  Junaed Sattar,et al.  Toward a Generic Diver-Following Algorithm: Balancing Robustness and Efficiency in Deep Visual Detection , 2018, IEEE Robotics and Automation Letters.

[27]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Gregory Dudek,et al.  Autonomous adaptive exploration using realtime online spatiotemporal topic modeling , 2014, Int. J. Robotics Res..

[30]  Gregory Dudek,et al.  Multi-domain monitoring of marine environments using a heterogeneous robot team , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Yan Song,et al.  Underwater image feature extraction and matching based on visual saliency detection , 2016, OCEANS 2016 - Shanghai.

[33]  Hugo Larochelle,et al.  Recurrent Mixture Density Network for Spatiotemporal Visual Attention , 2016, ICLR.

[34]  Ana Cristina Murillo,et al.  CoralSeg: Learning coral segmentation from sparse annotations , 2019, J. Field Robotics.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Gregory Dudek,et al.  A Vision-Based Control and Interaction Framework for a Legged Underwater Robot , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  Junaed Sattar,et al.  Underwater Image Super-Resolution using Deep Residual Multipliers , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Md Jahidul Islam,et al.  Understanding human motion and gestures for underwater human–robot collaboration , 2018, J. Field Robotics.

[40]  A. Lendasse,et al.  Underwater image segmentation with co-saliency detection and local statistical active contour model , 2017, OCEANS 2017 - Aberdeen.

[41]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[42]  Ali Borji,et al.  Salient Object Detection Driven by Fixation Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  G. Padmavathi,et al.  Non linear Image segmentation using fuzzy c means clustering method with thresholding for underwater images , 2010 .

[44]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[45]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46]  Junaed Sattar,et al.  Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception , 2020, Robotics: Science and Systems.

[47]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jenq-Neng Hwang,et al.  Automatic fish segmentation via double local thresholding for trawl-based underwater camera systems , 2011, 2011 18th IEEE International Conference on Image Processing.

[49]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[50]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[51]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.