Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling

Conditional random fields (CRFs) provide powerful tools for building models to label image segments. They are particularly well-suited to modeling local interactions among adjacent regions (e.g., super pixels). However, CRFs are limited in dealing with complex, global (long-range) interactions between regions. Complementary to this, restricted Boltzmann machines (RBMs) can be used to model global shapes produced by segmentation models. In this work, we present a new model that uses the combined power of these two network types to build a state-of-the-art labeler. Although the CRF is a good baseline labeler, we show how an RBM can be added to the architecture to provide a global shape bias that complements the local modeling provided by the CRF. We demonstrate its labeling performance for the parts of complex face images from the Labeled Faces in the Wild data set. This hybrid model produces results that are both quantitatively and qualitatively better than the CRF alone. In addition to high-quality labeling results, we demonstrate that the hidden units in the RBM portion of our model can be interpreted as face attributes that have been learned without any attribute-level supervision.

[1]  Larry S. Davis,et al.  Detection and analysis of hair , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Honglak Lee,et al.  Learning to Align from Scratch , 2012, NIPS.

[4]  Jitendra Malik,et al.  Learning to Detect Natural Image Boundaries Using Brightness and Texture , 2002, NIPS.

[5]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[6]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[7]  Erik G. Learned-Miller,et al.  Towards unconstrained face recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Geoffrey E. Hinton,et al.  Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[9]  Shihong Lao,et al.  A Compositional Exemplar-Based Model for Hair Segmentation , 2010, ACCV.

[10]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[11]  Tal Hassner,et al.  Effective Unconstrained Face Recognition by Combining Multiple Descriptors and Learned Background Statistics , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ali Farhadi,et al.  Attribute Discovery via Predictable Discriminative Binary Codes , 2012, ECCV.

[13]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Nan Wang,et al.  What are good parts for hair shape modeling? , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jitendra Malik,et al.  Textons, contours and regions: cue integration in image segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[18]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jean-Marc Odobez,et al.  Joint Adaptive Colour Modelling and Skin, Hair and Clothes Segmentation using Coherent Probabilistic Index Maps , 2011, BMVC.

[20]  Dragomir Anguelov,et al.  Markov random field models for hair and face segmentation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[21]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Nicolas Heess,et al.  The Shape Boltzmann Machine: A strong model of object shape , 2012, CVPR.

[23]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[24]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[26]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[27]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[28]  Richard S. Zemel,et al.  Learning and Incorporating Top-Down Cues in Image Segmentation , 2006, ECCV.

[29]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[30]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Christopher K. I. Williams,et al.  A Generative Model for Parts-based Object Segmentation , 2012, NIPS.