Learning About Multiple Objects in Images: Factorial Learning without Factorial Search

We consider data which are images containing views of multiple objects. Our task is to learn about each of the objects present in the images. This task can be approached as a factorial learning problem, where each image must be explained by instantiating a model for each of the objects present with the correct instantiation parameters. A major problem with learning a factorial model is that as the number of objects increases, there is a combinatorial explosion of the number of configurations that need to be considered. We develop a method to extract object models sequentially from the data by making use of a robust statistical method, thus avoiding the combinatorial explosion, and present results showing successful extraction of objects from real images.

[1]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[3]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[5]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[7]  Christoph von der Malsburg,et al.  Are object shape primitives learnable? , 1999, Neurocomputing.

[8]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[9]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[10]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[11]  Andrew Blake,et al.  Statistical Background Modelling for Tracking with a Virtual Camera , 1995, BMVC.