Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines

The use of semantic attributes in computer vision problems has been gaining increased popularity in recent years. Attributes provide an intermediate feature representation in between low-level features and the class categories, and offer several attractive properties, among which are improved learning of novel categories based on few examples, as well as allowing for zero-shot learning. However, the major caveat is that learning semantic attributes is a laborious task, requiring a significant amount of time and human intervention to provide labels. In order to address this issue, we propose a weakly supervised approach to learn mid-level features, where the only supervision is provided by the category classes of the training examples. We develop a novel extension of the restricted Boltzmann machine (RBM) with Beta-Bernoulli process priors. Unlike the standard RBM, our model uses the class labels to promote more efficient sharing of information by different categories. This tends to improve the generalization performance. By using semantic attributes for which annotations are available, we show that we can find correspondences between the mid-level features that we learn and the labeled attributes. Therefore, the mid-level features have distinct semantic characterization which is very similar to that given by the semantic attributes, even though their labeling was not used during the training process. Our experimental results in object recognition tasks show significant performance gains, outperforming methods which rely on manually labeled semantic attributes.

[1]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[2]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[3]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[4]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[8]  Fei-Fei Li,et al.  Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Yang Wang,et al.  A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[12]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[13]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  David B. Dunson,et al.  The Hierarchical Beta Process for Convolutional Factor Analysis and Deep Learning , 2011, ICML.

[15]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[16]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[19]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[20]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[21]  Frédéric Jurie,et al.  Improving Image Classification Using Semantic Attributes , 2012, International Journal of Computer Vision.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[25]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[26]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[27]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Kristen Grauman,et al.  Sharing features between objects and their attributes , 2011, CVPR 2011.

[29]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[30]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.