A deep interactive segmentation method with user interaction-based attention module and polar transformation

Interactive segmentation that extracts a specific foreground selected by the user input is widely employed in many user-interactive applications such as image editing and ground-truth labeling. In general, most interactive segmentation methods iteratively refine the previously obtained result using additional user interactions because they often produce unsatisfactory results with a single user input. A recently developed convolutional neural network (CNN)-based interactive segmentation method called deep interactive object selection has achieved high segmentation accuracy with fewer user interactions than earlier non-CNN-based approaches. However, the computational efficiency of deep interactive object selection deteriorates due to the repetitive feature extraction stage for each user interaction. Furthermore, the deep interactive object selection requires graph cut as a post-processing step to refine the boundary segments. To solve this problem, this paper presents a deep CNN-based interactive segmentation method employing an effective and simple user interaction-based attention module that does not require the repetitive feature extraction. In addition, we adopt Cartesian to polar coordinate transformation to further improve the segmentation performance. Experimental results demonstrate that the proposed interactive segmentation method is superior to the conventional ones in terms of segmentation accuracy and computational efficiency.

[1]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[3]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Manuela Chessa,et al.  Design strategies for direct multi-scale and multi-orientation feature extraction in the log-polar domain , 2012, Pattern Recognit. Lett..

[5]  Ben Wang,et al.  Reverse Attention for Salient Object Detection , 2018, ECCV.

[6]  Yun Fu,et al.  Tell Me Where to Look: Guided Attention Inference Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Noel E. O'Connor,et al.  A comparative evaluation of interactive segmentation algorithms , 2010, Pattern Recognit..

[8]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Andrew Blake,et al.  Geodesic star convexity for interactive image segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[14]  Ning Xu,et al.  Deep Interactive Object Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[16]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17]  Kwanghoon Sohn,et al.  Lazy Dragging: Effortless bounding-box drawing for touch-screen devices , 2017, 2017 IEEE International Conference on Consumer Electronics (ICCE).

[18]  Vladimir Vezhnevets,et al.  “GrowCut” - Interactive Multi-Label N-D Image Segmentation By Cellular Automata , 2005 .

[19]  Guillermo Sapiro,et al.  Geodesic Matting: A Framework for Fast Interactive Image and Video Segmentation and Matting , 2009, International Journal of Computer Vision.

[20]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Hamid R. Rabiee,et al.  A novel rotation/scale invariant template matching algorithm using weighted adaptive lifting scheme transform , 2010, Pattern Recognit..

[23]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[24]  George Wolberg,et al.  Image registration using log-polar mappings for recovery of large-scale similarity and projective transformations , 2005, IEEE Transactions on Image Processing.

[25]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.