UDD: An Underwater Open-sea Farm Object Detection Dataset for Underwater Robot Picking

To promote the development of underwater robot picking in sea farms, we propose an underwater open-sea farm object detection dataset called UDD. Concretely, UDD consists of 3 categories (seacucumber, seaurchin, and scallop) with 2227 images. To the best of our knowledge, it's the first dataset collected in a real open-sea farm for underwater robot picking and we also propose a novel Poisson-blending-embedded Generative Adversarial Network (Poisson GAN) to overcome the class-imbalance and massive small objects issues in UDD. By utilizing Poisson GAN to change the number, position, even size of objects in UDD, we construct a large scale augmented dataset (AUDD) containing 18K images. Besides, in order to make the detector better adapted to the underwater picking environment, a dataset (Pre-trained dataset) for pre-training containing 590K images is also proposed. Finally, we design a lightweight network (UnderwaterNet) to address the problems that detecting small objects from cloudy underwater pictures and meeting the efficiency requirements in robots. Specifically, we design a depth-wise-convolution-based Multi-scale Contextual Features Fusion (MFF) block and a Multi-scale Blursampling (MBP) module to reduce the parameters of the network to 1.3M at 48FPS, without any loss on accuracy. Extensive experiments verify the effectiveness of the proposed UnderwaterNet, Poisson GAN, UDD, AUDD, and Pre-trained datasets.

[1]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Minjae Kim,et al.  U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation , 2019, ICLR.

[3]  Faisal Shafait,et al.  Automated Fish Detection in Underwater Images Using Shape‐Based Level Sets , 2015 .

[4]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[8]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[11]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[12]  Shang-Hong Lai,et al.  AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation , 2018, ECCV.

[13]  Quoc V. Le,et al.  MixConv: Mixed Depthwise Convolutional Kernels , 2019, BMVC.

[14]  Richard Zhang,et al.  Making Convolutional Networks Shift-Invariant Again , 2019, ICML.

[15]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[20]  Jacco van Ossenbruggen,et al.  Fish4label: accomplishing an expert task without expert knowledge , 2013, OAIR.

[21]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[22]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[23]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[24]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[25]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[26]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[27]  Robert B. Fisher,et al.  Overview of the LifeCLEF 2014 Fish Task , 2014, CLEF.

[28]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[29]  Hengshuang Zhao,et al.  GridMask Data Augmentation , 2020, ArXiv.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Pengfei Xiong,et al.  Pyramid Attention Network for Semantic Segmentation , 2018, BMVC.

[33]  Kai Chen,et al.  Region Proposal by Guided Anchoring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Geoffrey E. Hinton,et al.  Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.

[35]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Rongrong Ji,et al.  FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.

[38]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[39]  Longhui Wei,et al.  Person Transfer GAN to Bridge Domain Gap for Person Re-identification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Haojie Li,et al.  User-Guided Deep Anime Line Art Colorization with Conditional Adversarial Networks , 2018, ACM Multimedia.

[41]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Jiri Matas,et al.  DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[45]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[48]  Yuning Jiang,et al.  FoveaBox: Beyond Anchor-based Object Detector , 2019, ArXiv.

[49]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[51]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Xiu Li,et al.  Fast accurate fish detection and recognition of underwater images with Fast R-CNN , 2015, OCEANS 2015 - MTS/IEEE Washington.

[54]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Changshui Zhang,et al.  DeepFish: Accurate underwater live fish recognition with a deep architecture , 2016, Neurocomputing.

[56]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[57]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[58]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.