Learning models of object structure

We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images.

[1]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[2]  M. B. Clowes,et al.  On Seeing Things , 1971, Artif. Intell..

[3]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[4]  Shimon Ullman,et al.  Recognizing solid objects by alignment with an image , 1990, International Journal of Computer Vision.

[5]  Kokichi Sugihara,et al.  A Necessary and Sufficient Condition for a Picture to Represent a Polyhedral Scene , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  P. Green,et al.  Trans-dimensional Markov chain Monte Carlo , 2000 .

[7]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[8]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[9]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[11]  Jitendra Malik,et al.  Recovering 3D human body configurations using shape contexts , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[13]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[14]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[17]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[18]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[19]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[20]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Martial Hebert,et al.  Beyond Local Appearance: Category Recognition from Pairwise Interactions of Simple Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[24]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[26]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[27]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[29]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[30]  Gunilla Borgefors,et al.  Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Long Zhu,et al.  Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing , 2006, NIPS.

[33]  David A. Forsyth,et al.  The Joy of Sampling , 2004, International Journal of Computer Vision.

[34]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[35]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..