Image Parsing: Unifying Segmentation, Detection, and Recognition

In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches—generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns—generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657–673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  S. Ullman Visual routines , 1984, Cognition.

[4]  S. Geman,et al.  Diffusions for global optimizations , 1986 .

[5]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[6]  Anne Treisman,et al.  Features and objects in visual processing , 1986 .

[7]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[9]  Ulf Grenander,et al.  Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[12]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[13]  Joel L. Davis,et al.  Large-Scale Neuronal Theories of the Brain , 1994 .

[14]  S Ullman,et al.  Sequence seeking and counter streams: a computational model for bidirectional information flow in the visual cortex. , 1995, Cerebral cortex.

[15]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[16]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[17]  David Mumford,et al.  Neuronal Architectures for Pattern-theoretic Problems , 1995 .

[18]  Elie Bienenstock,et al.  Compositionality, MDL Priors, and Object Recognition , 1996, NIPS.

[19]  Geoffrey E. Hinton,et al.  Using Generative Models for Handwritten Digit Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[21]  Alan L. Yuille,et al.  Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[23]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[26]  Ellen K. Hughes,et al.  Video OCR for Digital News Archives , 1998 .

[27]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[28]  Harry Wechsler,et al.  The FERET database and evaluation procedure for face-recognition algorithms , 1998, Image Vis. Comput..

[29]  Sean Dougherty,et al.  Edge detector evaluation using empirical ROC curves , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[30]  A. Yuille,et al.  Two- and Three-Dimensional Patterns of the Face , 2001 .

[31]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[32]  Narendra Ahuja,et al.  Face detection using mixtures of linear subspaces , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[33]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[34]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[35]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[36]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[37]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[38]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[39]  Rong Zhang,et al.  Integrating bottom-up/top-down for object recognition by data driven Markov chain Monte Carlo , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[40]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[42]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[44]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[45]  Paul A. Viola,et al.  Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade , 2001, NIPS.

[46]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[47]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Refractor Vision , 2000, The Lancet.

[49]  Sean Dougherty,et al.  Edge Detector Evaluation Using Empirical ROC Curves , 2001, Comput. Vis. Image Underst..

[50]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[51]  Zhuowen Tu,et al.  Parsing Images into Region and Curve Processes , 2002, ECCV.

[52]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[53]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[54]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[55]  James M. Rehg,et al.  Learning a Rare Event Detection Cascade by Direct Feature Selection , 2003, NIPS.

[56]  Alan L. Yuille,et al.  Statistical Edge Detection: Learning and Evaluating Edge Cues , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Song-Chun Zhu,et al.  How Do Heuristics Expedite Markov Chain Search? Hitting-time Analysis of the Independence Metropolis Sampler , 2003 .

[58]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[59]  Adrian Barbu,et al.  Graph partition by Swendsen-Wang cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[60]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[61]  Feng Han,et al.  Bayesian reconstruction of 3D shapes and scenes from a single image , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[62]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[63]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[64]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[65]  Alan L. Yuille,et al.  AdaBoost Learning for Detecting and Reading Text in City Scenes , 2004, CVPR 2004.

[66]  Multigrid and multi-level Swendsen-Wang cuts for hierarchic graph partition , 2004, CVPR 2004.

[67]  Song-Chun Zhu,et al.  Multigrid and multi-level Swendsen-Wang cuts for hierarchic graph partition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[68]  J. Ponce,et al.  Towards true 3D object recognition , 2004 .

[69]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[70]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[71]  Charless C. Fowlkes,et al.  How Much Does Globalization Help Segmentation ? , 2004 .

[72]  Zhuowen Tu,et al.  Shape Matching and Recognition - Using Generative Models and Informative Features , 2004, ECCV.

[73]  D. Geman,et al.  Hierarchical testing designs for pattern recognition , 2005, math/0507421.

[74]  Michael I. Jordan,et al.  The DLR Hierarchy of Approximate Inference , 2005, UAI.

[75]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[76]  Zhuowen Tu,et al.  Parsing Images into Regions, Curves, and Curve Groups , 2006, International Journal of Computer Vision.

[77]  A. Yuille,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.