Constrained fixation point based segmentation via deep neural network

Abstract It is an explicit mode to use the clicking points by the mouse in the interactive image segmentation, while an implicit interaction mode is to use the fixation points from the eye-tracking device. Both modes can provide a series of points. Inspired by the similarity between these two interaction modes, we propose a novel human visual system (HVS) based neural network for transferring the constrained fixation point based segmentation to the clicking point based interactive segmentation. Briefly speaking, the sequence of information transmission and processing in our model is RGB image, VGG-16 backbone, LGN-like module (LGNL) and ConvLSTM block, which correspond to the pathway of stimulus transmission and processing, i.e. stimulus, retina, lateral geniculate nucleus (LGN) and visual cortex in the HVS. First, the RGB image is fed to the VGG-16 backbone to obtain the multiple-layer feature maps. Then the LGNL is adopted to effectively incorporate edge-aware features and semantic features from different layers of the VGG-16 backbone in multiple resolutions, so as to produce rich contextual features. Finally, with the guidance of the fixation density map transformed from the fixation points, the output feature maps of LGNL are utilized to generate the segmentation map via a stack of ConvLSTM blocks in a coarse-to-fine manner. Comprehensive experiments demonstrate that the proposed HVS based neural network achieves a higher segmentation performance and outperforms seven state-of-the-art methods, and prove that the transfer from constrained fixation points to clicking points is reasonable and valid.

[1]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[3]  Wenbin Zou,et al.  Saliency Tree: A Novel Saliency Detection Framework , 2014, IEEE Transactions on Image Processing.

[4]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Cheng Zeng,et al.  Merging fixation for saliency detection in a multilayer graph , 2017, Neurocomputing.

[6]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  King Ngi Ngan,et al.  Gaze-Based Object Segmentation , 2017, IEEE Signal Processing Letters.

[9]  Ali Borji,et al.  Salient Object Detection Driven by Fixation Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[12]  Wenguan Wang,et al.  Shifting More Attention to Video Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[14]  Jing Li,et al.  Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model , 2017, IEEE Transactions on Image Processing.

[15]  Sanyuan Zhao,et al.  Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[16]  Weisi Lin,et al.  Saliency Detection in the Compressed Domain for Adaptive Image Retargeting , 2012, IEEE Transactions on Image Processing.

[17]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Junwei Han,et al.  Synthesizing Supervision for Learning Deep Saliency Network without Human Annotation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cheolkon Jung,et al.  Point-cut: Fixation point-based image segmentation using random walk model , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  S. Palmer Visual Perception of Objects , 2003 .

[23]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Chuanbo Chen,et al.  Saliency detection from one time sampling for eye fixation prediction , 2016, Multimedia Tools and Applications.

[25]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[27]  Andrew Blake,et al.  Geodesic star convexity for interactive image segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[29]  J. Hoffman,et al.  The role of visual attention in saccadic eye movements , 1995, Perception & psychophysics.

[30]  Haibo Wang,et al.  Salient object detection with fixation priori , 2016, 2016 International Conference on Machine Learning and Cybernetics (ICMLC).

[31]  Shenghua Gao,et al.  Beyond Universal Saliency: Personalized Saliency Prediction with Multi-task CNN , 2017, IJCAI.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  K. Rayner The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search , 2009, Quarterly journal of experimental psychology.

[34]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[35]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[36]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Feiping Nie,et al.  Unsupervised Salient Object Detection via Inferring From Imperfect Saliency Models , 2018, IEEE Transactions on Multimedia.