论文信息 - Semantic Segmentation of Underwater Imagery: Dataset and Benchmark

Semantic Segmentation of Underwater Imagery: Dataset and Benchmark

In this paper, we present the first large-scale dataset for semantic Segmentation of Underwater IMagery (SUIM). It contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants. We also present a benchmark evaluation of state-of-the-art semantic segmentation approaches based on standard performance metrics. In addition, we present SUIM-Net, a fully-convolutional encoder-decoder model that balances the trade-off between performance and computational efficiency. It offers competitive performance while ensuring fast end-to-end inference, which is essential for its use in the autonomy pipeline of visually-guided underwater robots. In particular, we demonstrate its usability benefits for visual servoing, saliency prediction, and detailed scene understanding. With a variety of use cases, the proposed model and benchmark dataset open up promising opportunities for future research in underwater robot vision.

[1] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3] Zhengmao Ye. Objective Assessment of Nonlinear Segmentation Approaches to Gray Level Underwater Images , 2009 .

[4] David J. Kriegman,et al. Automated annotation of coral reef survey images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Daniel Cagara,et al. Improving Underwater Obstacle Detection using Semantic Image Segmentation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[6] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Faisal Shafait,et al. Automated Fish Detection in Underwater Images Using Shape‐Based Level Sets , 2015 .

[8] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Luz Abril Torres-Méndez,et al. Robotic Visual Tracking of Relevant Cues in Underwater Environments with Poor Visibility Conditions , 2016, J. Sensors.

[10] Tao Mei,et al. Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Hanumant Singh,et al. Visual summaries for low-bandwidth semantic mapping with autonomous underwater vehicles , 2014, 2014 IEEE/OES Autonomous Underwater Vehicles (AUV).

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14] Md Jahidul Islam,et al. Fast Underwater Image Enhancement for Improved Visual Perception , 2020, IEEE Robotics and Automation Letters.

[15] Samee Ullah Khan,et al. MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation , 2016, Future Gener. Comput. Syst..

[16] Sinisa Todorovic,et al. A Multi-scale CNN for Affordance Segmentation in RGB Images , 2016, ECCV.

[17] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[18] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[19] Hanumant Singh,et al. Robotic tools for deep water archaeology: Surveying an ancient shipwreck with an autonomous underwater vehicle , 2010, J. Field Robotics.

[20] Gregory Dudek,et al. Underwater multi-robot convoying using visual tracking by detection , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[22] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Md Jahidul Islam,et al. Mixed-domain biological motion tracking for underwater human-robot interaction , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24] Mert R. Sabuncu,et al. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[25] Baihua Li,et al. Underwater scene segmentation by deep neural network , 2019 .

[26] Junaed Sattar,et al. Toward a Generic Diver-Following Algorithm: Balancing Robustness and Efficiency in Deep Visual Detection , 2018, IEEE Robotics and Automation Letters.

[27] Graham W. Taylor,et al. Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Gregory Dudek,et al. Autonomous adaptive exploration using realtime online spatiotemporal topic modeling , 2014, Int. J. Robotics Res..

[30] Gregory Dudek,et al. Multi-domain monitoring of marine environments using a heterogeneous robot team , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31] Yoshua Bengio,et al. ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32] Yan Song,et al. Underwater image feature extraction and matching based on visual saliency detection , 2016, OCEANS 2016 - Shanghai.

[33] Hugo Larochelle,et al. Recurrent Mixture Density Network for Spatiotemporal Visual Attention , 2016, ICLR.

[34] Ana Cristina Murillo,et al. CoralSeg: Learning coral segmentation from sparse annotations , 2019, J. Field Robotics.

[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Gregory Dudek,et al. A Vision-Based Control and Interaction Framework for a Legged Underwater Robot , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[37] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38] Junaed Sattar,et al. Underwater Image Super-Resolution using Deep Residual Multipliers , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[39] Md Jahidul Islam,et al. Understanding human motion and gestures for underwater human–robot collaboration , 2018, J. Field Robotics.

[40] A. Lendasse,et al. Underwater image segmentation with co-saliency detection and local statistical active contour model , 2017, OCEANS 2017 - Aberdeen.

[41] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.

[42] Ali Borji,et al. Salient Object Detection Driven by Fixation Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] G. Padmavathi,et al. Non linear Image segmentation using fuzzy c means clustering method with thresholding for underwater images , 2010 .

[44] José García Rodríguez,et al. A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[45] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46] Junaed Sattar,et al. Simultaneous Enhancement and Super-Resolution of Underwater Imagery for Improved Visual Perception , 2020, Robotics: Science and Systems.

[47] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Jenq-Neng Hwang,et al. Automatic fish segmentation via double local thresholding for trawl-based underwater camera systems , 2011, 2011 18th IEEE International Conference on Image Processing.

[49] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[50] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[51] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.