Image Parsing: Unifying Segmentation, Detection, and Recognition

We propose a general framework for parsing images into regions and objects. In this framework, the detection and recognition of objects proceed simultaneously with image segmentation in a competitive and cooperative manner. We illustrate our approach on natural images of complex city scenes where the objects of primary interest are faces and text. This method makes use of bottom-up proposals combined with top-down generative models using the data driven Markov chain Monte Carlo (DDMCMC) algorithm, which is guaranteed to converge to the optimal estimate asymptotically. More precisely, we define generative models for faces, text, and generic regions- e.g. shading, texture, and clutter. These models are activated by bottom-up proposals. The proposals for faces and text are learnt using a probabilistic version of AdaBoost. The DDMCMC combines reversible jump and diffusion dynamics to enable the generative models to explain the input images in a competitive and cooperative manner. Our experiments illustrate the advantages and importance of combining bottom-up and top-down models and of performing segmentation and object detection/recognition simultaneously.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  S. Ullman Visual routines , 1984, Cognition.

[4]  S. Geman,et al.  Diffusions for global optimizations , 1986 .

[5]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[6]  Anne Treisman,et al.  Features and objects in visual processing , 1986 .

[7]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[9]  Ulf Grenander,et al.  Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[12]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[13]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[14]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[15]  Elie Bienenstock,et al.  Compositionality, MDL Priors, and Object Recognition , 1996, NIPS.

[16]  Geoffrey E. Hinton,et al.  Using Generative Models for Handwritten Digit Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Alan L. Yuille,et al.  Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[20]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[23]  Ellen K. Hughes,et al.  Video OCR for Digital News Archives , 1998 .

[24]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[25]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[26]  A. Yuille,et al.  Two- and Three-Dimensional Patterns of the Face , 2001 .

[27]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[28]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[29]  Narendra Ahuja,et al.  Face detection using mixtures of linear subspaces , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[30]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[31]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[32]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[33]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[34]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[35]  Rong Zhang,et al.  Integrating bottom-up/top-down for object recognition by data driven Markov chain Monte Carlo , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[36]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Serge J. Belongie,et al.  Matching shapes , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[38]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[39]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[40]  Paul A. Viola,et al.  Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade , 2001, NIPS.

[41]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[42]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Refractor Vision , 2000, The Lancet.

[44]  Sean Dougherty,et al.  Edge Detector Evaluation Using Empirical ROC Curves , 2001, Comput. Vis. Image Underst..

[45]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[46]  Zhuowen Tu,et al.  Parsing Images into Region and Curve Processes , 2002, ECCV.

[47]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[48]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  James M. Rehg,et al.  Learning a Rare Event Detection Cascade by Direct Feature Selection , 2003, NIPS.

[51]  Alan L. Yuille,et al.  Statistical Edge Detection: Learning and Evaluating Edge Cues , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[53]  Adrian Barbu,et al.  Graph partition by Swendsen-Wang cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[54]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[55]  Feng Han,et al.  Bayesian reconstruction of 3D shapes and scenes from a single image , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[56]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[57]  Alan L. Yuille,et al.  AdaBoost Learning for Detecting and Reading Text in City Scenes , 2004, CVPR 2004.

[58]  Multigrid and multi-level Swendsen-Wang cuts for hierarchic graph partition , 2004, CVPR 2004.

[59]  J. Ponce,et al.  Towards true 3D object recognition , 2004 .

[60]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[61]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[62]  Zhuowen Tu,et al.  Shape Matching and Recognition - Using Generative Models and Informative Features , 2004, ECCV.