Probabilistic bilinear models for appearance-based vision

We present a probabilistic approach to learning object representations based on the "content and style" bilinear generative model of Tenenbaum and Freeman. In contrast to their earlier SVD-based approach, our approach models images using particle filters. We maintain separate particle filters to represent the content and style spaces, allowing us to define arbitrary weighting functions over the particles to help estimate the content/style densities. We combine this approach with a new EM-based method for learning basis vectors that describe content-style mixing. Using a particle-based representation permits good reconstruction despite reduced dimensionality, and increases storage capacity and computational efficiency. We describe how learning the distributions using particle filters allows us to efficiently compute a probabilistic "novelty" term. Our example application considers a dataset of faces under different lighting conditions. The system classifies faces of people it has seen before, and can identify previously unseen faces as new content. Using a probabilistic definition of novelty in conjunction with learning content-style separability provides a crucial building block for designing real-world, real-time object recognition systems.

[1]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2]  Alex Pentland,et al.  Probabilistic object recognition and localization , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Rajesh P. N. Rao,et al.  A Bilinear Model for Sparse Coding , 2002, NIPS.

[5]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[6]  John Langford,et al.  Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes , 1999, ICML.

[7]  Brendan J. Frey,et al.  Topographic Transformation as a Discrete Latent Variable , 1999, NIPS.

[8]  Stephen M. Omohundro,et al.  Bumptrees for Efficient Function, Constraint and Classification Learning , 1990, NIPS.

[9]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[10]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[11]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[12]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[13]  D. Rubin Using the SIR algorithm to simulate posterior distributions , 1988 .

[14]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[15]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[16]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.