Describing Visual Scenes using Transformed Dirichlet Processes

Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object-centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP's inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images.

[1]  Harry G. Barrow,et al.  Experiments in Interpretation-Guided Segmentation , 1977, Artificial Intelligence.

[2]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[3]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[5]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[7]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[8]  Christopher K. I. Williams,et al.  Image Modeling with Position-Encoding Dynamic Trees , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[14]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[15]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[18]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[21]  Stuart J. Russell,et al.  Probabilistic models with unknown objects , 2006 .

[22]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[23]  S. MacEachern,et al.  Bayesian Density Estimation and Inference Using Mixtures , 2007 .

[24]  Mary P. Harper,et al.  Spatial Random Tree Grammars for Modeling Hierarchal Structure in Images with Regions of Arbitrary Shape , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.