论文信息 - Describing Visual Scenes using Transformed Dirichlet Processes

Describing Visual Scenes using Transformed Dirichlet Processes

Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object-centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP's inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images.

[1] Harry G. Barrow,et al. Experiments in Interpretation-Guided Segmentation , 1977, Artificial Intelligence.

[2] M. Escobar,et al. Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[3] Paul A. Viola,et al. Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[5] Brendan J. Frey,et al. Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[7] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[8] Christopher K. I. Williams,et al. Image Modeling with Position-Encoding Dynamic Trees , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[14] Zhuowen Tu,et al. Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[15] Daniel P. Huttenlocher,et al. Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] Alexei A. Efros,et al. Discovering object categories in image collections , 2005 .

[18] Antonio Torralba,et al. Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19] Alexei A. Efros,et al. Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20] Stuart J. Russell,et al. BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[21] Stuart J. Russell,et al. Probabilistic models with unknown objects , 2006 .

[22] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[23] S. MacEachern,et al. Bayesian Density Estimation and Inference Using Mixtures , 2007 .

[24] Mary P. Harper,et al. Spatial Random Tree Grammars for Modeling Hierarchal Structure in Images with Regions of Arbitrary Shape , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.