Scene Parsing With Integration of Parametric and Non-Parametric Models

We adopt convolutional neural networks (CNNs) to be our parametric model to learn discriminative features and classifiers for local patch classification. Based on the occurrence frequency distribution of classes, an ensemble of CNNs (CNN-Ensemble) are learned, in which each CNN component focuses on learning different and complementary visual patterns. The local beliefs of pixels are output by CNN-Ensemble. Considering that visually similar pixels are indistinguishable under local context, we leverage the global scene semantics to alleviate the local ambiguity. The global scene constraint is mathematically achieved by adding a global energy term to the labeling energy function, and it is practically estimated in a non-parametric framework. A large margin-based CNN metric learning method is also proposed for better global belief estimation. In the end, the integration of local and global beliefs gives rise to the class likelihood of pixels, based on which maximum marginal inference is performed to generate the label prediction maps. Even without any post-processing, we achieve the state-of-the-art results on the challenging SiftFlow and Barcelona benchmarks.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[3]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[4]  Rob Fergus,et al.  Nonparametric image parsing using adaptive neighbor sets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Stephen Gould,et al.  PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer , 2012, ECCV.

[6]  Pushmeet Kohli,et al.  Non-parametric Higher-Order Random Fields for Image Segmentation , 2014, ECCV.

[7]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[9]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[10]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Marian George,et al.  Image parsing with a wide range of classes and scene-level context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  James J. Little,et al.  Scene parsing by nonparametric label transfer of content-adaptive windows , 2016, Comput. Vis. Image Underst..

[16]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[19]  Haibin Ling,et al.  Dense Recurrent Neural Networks for Scene Labeling , 2018, ArXiv.

[20]  Joost van de Weijer,et al.  Unrolling Loopy Top-Down Semantic Feedback in Convolutional Deep Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[25]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[26]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  KohliPushmeet,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2009 .

[28]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Gang Wang,et al.  Exemplar based Deep Discriminative and Shareable Feature Learning for scene image classification , 2015, Pattern Recognit..

[30]  Tsuhan Chen,et al.  Efficient inference for fully-connected CRFs with stationarity , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  Sebastian Nowozin,et al.  Decision Jungles: Compact and Rich Models for Classification , 2013, NIPS.

[35]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[36]  LeCunYann,et al.  Learning Hierarchical Features for Scene Labeling , 2013 .

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  Gang Wang,et al.  Quaddirectional 2D-Recurrent Neural Networks For Image Labeling , 2015, IEEE Signal Processing Letters.

[39]  Jana Kosecka,et al.  Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Peter Kontschieder,et al.  Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[42]  Xuming He,et al.  Superpixel Graph Label Transfer with Learned Distance Metric , 2014, ECCV.

[43]  Sinisa Todorovic,et al.  Scene Labeling Using Beam Search under Mutex Constraints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Gregory Shakhnarovich,et al.  Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[45]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[46]  Gang Wang,et al.  DAG-Recurrent Neural Networks for Scene Labeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  James J. Little,et al.  CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows , 2014, ECCV.

[49]  Gang Wang,et al.  Integrating parametric and non-parametric models for scene labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Gang Wang,et al.  Video Tracking Using Learned Hierarchical Features , 2015, IEEE Transactions on Image Processing.

[53]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[54]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Mohammadreza Mostajabi,et al.  A robust multilevel segment description for multi-class object recognition , 2014, Machine Vision and Applications.

[56]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[57]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .