Early Hierarchical Contexts Learned by Convolutional Networks for Image Segmentation

We propose a foreground segmentation method based on convolutional networks. To predict the label of a pixel in an image, the model takes a hierarchical context as the input, which is obtained by combining multiple context patches on different scales. Short range contexts depict the local details, while long range contexts capture the object-scene relationships in an image. Early means that we combine the context patches of a pixel into a hierarchical one before any trainable layers are learned, i.e., early-combing. In contrast, late-combing means that the combination occurs later, e.g., when the convolutional feature extractor in a network has already been learned. We find that it is vital for the whole model to jointly learn the patterns of contexts on different scales in our task. Experiments show that early-combing performs better than late-combing. On the dataset1 built up by Baidu IDL2 for a latest person segmentation contest, our method beats all the competitors with a considerable margin. Qualitative results also show that the proposed method is almost ready for practical application.

[1]  Richard S. Zemel,et al.  Learning Hybrid Models for Image Annotation with Partially Labeled Data , 2008, NIPS.

[2]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tieniu Tan,et al.  Toward Accurate and Fast Iris Segmentation for Iris Biometrics , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[6]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[8]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10]  Andrew Zisserman,et al.  Pylon Model for Semantic Segmentation , 2011, NIPS.

[11]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[13]  Nanning Zheng,et al.  Automatic salient object segmentation based on context and shape prior , 2011, BMVC.

[14]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[15]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[16]  Daphne Koller,et al.  Efficiently selecting regions for scene understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.