Fast Segment Anything

The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50 times higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness. The codes and demos will be released at https://github.com/CASIA-IVA-Lab/FastSAM.

[1]  Ross B. Girshick,et al.  Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  A. Torralba,et al.  Detecting Everything in the Open World: Towards Universal Object Detection , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  H. Liao,et al.  YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Wei Ji,et al.  Exploring Denoised Cross-video Contrast for Weakly-supervised Temporal Action Localization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Haibin Ling,et al.  EDTER: Edge Detection with Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Dahun Kim,et al.  Learning Open-World Object Proposals without Learning to Classify , 2021, IEEE Robotics and Automation Letters.

[7]  Zeming Li,et al.  YOLOX: Exceeding YOLO Series in 2021 , 2021, ArXiv.

[8]  Alexander G. Schwing,et al.  Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.

[9]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[10]  Ling Shao,et al.  Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[12]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Carsten Steger,et al.  MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Meng Lu,et al.  Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[21]  Saining Xie,et al.  Holistically-Nested Edge Detection , 2015, International Journal of Computer Vision.

[22]  Ali Borji,et al.  Salient object detection: A survey , 2014, Computational Visual Media.

[23]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[24]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[25]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[28]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[30]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .