Synergy between Object Recognition and Image Segmentation Using the Expectation-Maximization Algorithm

In this work, we formulate the interaction between image segmentation and object recognition in the framework of the Expectation-Maximization (EM) algorithm. We consider segmentation as the assignment of image observations to object hypotheses and phrase it as the E-step, while the M-step amounts to fitting the object models to the observations. These two tasks are performed iteratively, thereby simultaneously segmenting an image and reconstructing it in terms of objects. We model objects using Active Appearance Models (AAMs) as they capture both shape and appearance variation. During the E-step, the fidelity of the AAM predictions to the image is used to decide about assigning observations to the object. For this, we propose two top-down segmentation algorithms. The first starts with an oversegmentation of the image and then softly assigns image segments to objects, as in the common setting of EM. The second uses curve evolution to minimize a criterion derived from the variational interpretation of EM and introduces AAMs as shape priors. For the M-step, we derive AAM fitting equations that accommodate segmentation information, thereby allowing for the automated treatment of occlusions. Apart from top-down segmentation results, we provide systematic experiments on object detection that validate the merits of our joint segmentation and recognition approach.

[1]  J. Sethian,et al.  FRONTS PROPAGATING WITH CURVATURE DEPENDENT SPEED: ALGORITHMS BASED ON HAMILTON-JACOB1 FORMULATIONS , 2003 .

[2]  David Mumford,et al.  Neuronal Architectures for Pattern-theoretic Problems , 1995 .

[3]  Iasonas Kokkinos,et al.  An expectation maximization approach to the synergy between image segmentation and object categorization , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Iasonas Kokkinos,et al.  Bottom-Up & Top-down Object Detection using Primal Sketch Features and Graphical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Alan L. Yuille,et al.  Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[7]  Alper Yilmaz,et al.  Level Set Methods , 2007, Wiley Encyclopedia of Computer Science and Engineering.

[8]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[9]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[10]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[11]  Petros Maragos,et al.  Multigrid Geometric Active Contour Models , 2007, IEEE Transactions on Image Processing.

[12]  Stella X. Yu,et al.  Object-specific figure-ground segregation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Andrew Zisserman,et al.  OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[15]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[16]  Daniel Cremers,et al.  Efficient Kernel Density Estimation of Shape and Intensity Priors for Level Set Segmentation , 2005, MICCAI.

[17]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[18]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[19]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[20]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Rachid Deriche,et al.  Geodesic Active Regions: A New Framework to Deal with Frame Partition Problems in Computer Vision , 2002, J. Vis. Commun. Image Represent..

[22]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[23]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[24]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[25]  Christopher M. Bishop Latent Variable Models , 1998, Learning in Graphical Models.

[26]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[27]  Iasonas Kokkinos,et al.  Unsupervised Learning of Object Deformation Models , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Olivier D. Faugeras,et al.  Statistical shape influence in geodesic active contours , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[30]  Ralph Gross,et al.  Active appearance models with occlusion , 2006, Image Vis. Comput..

[31]  Michael Jones,et al.  Multidimensional Morphable Models: A Framework for Representing and Matching Object Classes , 2004, International Journal of Computer Vision.

[32]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[33]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[34]  Luc Van Gool,et al.  Edinburgh Research Explorer Simultaneous Object Recognition and Segmentation by Image Exploration , 2022 .

[35]  Daniel Cremers,et al.  Dynamical statistical shape priors for level set-based tracking , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Nikos Paragios,et al.  Shape Priors for Level Set Representations , 2002, ECCV.

[37]  Guillermo Sapiro,et al.  Geodesic Active Contours , 1995, International Journal of Computer Vision.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[40]  Rong Zhang,et al.  Integrating bottom-up/top-down for object recognition by data driven Markov chain Monte Carlo , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[41]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[42]  Stanley Osher,et al.  Level Set Methods , 2003 .

[43]  D. Mumford Perception as Bayesian Inference: Pattern theory: A unifying perspective , 1996 .

[44]  Thierry Pun,et al.  Integration of bottom-up and top-down cues for visual attention using non-linear relaxation , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Adrian Barbu,et al.  Graph partition by Swendsen-Wang cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[46]  Daniel Cremers,et al.  Towards Recognition-Based Variational Segmentation Using Shape Priors and Dynamic Labeling , 2003, Scale-Space.

[47]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[48]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[49]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[50]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, International Journal of Computer Vision.

[51]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.