Statistical part-based models: theory and applications in image similarity, object detection and region labeling

The automatic analysis and indexing of visual content in unconstrained domain are important and challenging problems for a variety of multimedia applications. Much of the prior research work deals with the problems by modeling images and videos as feature vectors, such as global histogram or block-based representation. Despite substantial research efforts on analysis and indexing algorithms based on this representation, their performance remains unsatisfactory. This dissertation attempts to explore the problem from a different perspective through a part-based representation, where images and videos are represented as a collection of parts with their appearance and relational features. Such representation is partly motivated by the human vision research showing that the human vision system adopts similar mechanism to perceive images. Although part-based representation has been investigated for decades, most of the prior work has been focused on ad hoc or deterministic approaches, which require manual designs of the models and often have poor performance for real-world images or videos due to their inability to model uncertainty and noise. The main focus of this thesis instead is on incorporating statistical modeling and machine learning techniques into the paradigm of part-based modeling so as to alleviate the burden of human manual design, achieve the robustness to content variation and noise, and maximize the performance by learning from examples. We focus on the following three fundamental problems for visual content indexing and analysis: measuring the similarity of images, detecting objects and learning object models, and assigning semantic labels to the regions in images. We focus on a general graph-based representation for images and objects, called Attributed Relational Graph (ARG). We explore new statistical algorithms based upon this representation. Our main contributions include the following: First, we introduce a new principled similarity measure for ARGs that is able to learn the similarity from training data. We establish a theoretical framework for the similarity calculation and learning. And we have applied the developed method to detection of near-duplicate images. Second, we extend the ARG model and traditional Random Graph to a new model called Random Attributed Relational Graph (Random ARG) to represent an object model. We show how to achieve object detection through constructing Markov Random Fields, mapping parameters and performing approximations using advanced inference and learning algorithms. Third, we explore a higher-order relational model and efficient inference algorithms for the region labeling problem, using video scene text detection as a test case.

[1]  Tat-Seng Chua,et al.  Content-based retrieval of segmented images , 1994, MULTIMEDIA '94.

[2]  Michael Brady,et al.  Feature-based correspondence: an eigenvector approach , 1992, Image Vis. Comput..

[3]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[4]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  J. Spencer Probabilistic Methods in Combinatorics , 1974 .

[6]  Edwin R. Hancock,et al.  Inexact graph matching using genetic search , 1997, Pattern Recognit..

[7]  Jeffrey Scott Vitter,et al.  Multimedia retrieval by regions, concepts, and constraints , 2001 .

[8]  John R. Smith,et al.  Image Classification and Querying Using Composite Region Templates , 1999, Comput. Vis. Image Underst..

[9]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[10]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[14]  Kaleem Siddiqi,et al.  Matching Hierarchical Structures Using Association Graphs , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Shinji Umeyama,et al.  An Eigendecomposition Approach to Weighted Graph Matching Problems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[17]  Harry G. Barrow,et al.  Subgraph Isomorphism, Matching Relational Structures and Maximal Cliques , 1976, Inf. Process. Lett..

[18]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[19]  Salih O. Duffuaa,et al.  A Linear Programming Approach for the Weighted Graph Matching Problem , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Edwin R. Hancock,et al.  Graph Matching With a Dual-Step EM Algorithm , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[22]  S. Robertson The probability ranking principle in IR , 1997 .

[23]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[24]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[25]  Pietro Perona,et al.  Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[27]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[28]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[29]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[30]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[31]  John R. Smith,et al.  Improved text overlay detection in videos using a fusion-based classifier , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[32]  Shih-Fu Chang,et al.  Semantic video clustering across sources using bipartite spectral clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[33]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.

[34]  Yang Song,et al.  Towards detection of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[35]  Edwin R. Hancock,et al.  Structural Matching by Discrete Relaxation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[37]  Radu Horaud,et al.  Symbolic image matching by simulated annealing , 1990, BMVC.

[38]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Scott Daly,et al.  Digital Images and Human Vision , 1993 .

[40]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Pietro Perona,et al.  A discriminative framework for modelling object classes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[42]  Brendan J. Frey,et al.  Learning Graphical Models of Images, Videos and Their Spatial Transformations , 2000, UAI.

[43]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[44]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Jitendra Malik,et al.  Self Inducing Relational Distance and Its Application to Image Segmentation , 1998, ECCV.

[46]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[47]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[48]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[49]  Yee Whye Teh,et al.  Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation , 2001, UAI.

[50]  Brendan J. Frey,et al.  Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[51]  Dorin Comaniciu,et al.  Robust analysis of feature spaces: color image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52]  Christoph Schnörr,et al.  Subgraph Matching with Semidefinite Programming , 2003, Electron. Notes Discret. Math..

[53]  Edwin R. Hancock,et al.  Structural Graph Matching Using the EM Algorithm and Singular Value Decomposition , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Horst Bunke,et al.  Error Correcting Graph Matching: On the Influence of the Underlying Cost Function , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Shih-Fu Chang,et al.  Multimedia access and retrieval: the state of the art and future directions (panel session). , 1999, ACM Multimedia.

[56]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[57]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[58]  Robert M. Haralick,et al.  A Metric for Comparing Relational Descriptions , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[60]  Shih-Fu Chang,et al.  Using Relevance Feedback in Content-Based Image Metasearch , 1998, IEEE Internet Comput..

[61]  Ingemar J. Cox,et al.  Feature-based face recognition using mixture-distance , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[62]  Shih-Fu Chang,et al.  Learning to Detect Scene Text Using a Higher-Order MRF with Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[63]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[64]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[65]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[66]  Daphna Weinshall,et al.  Efficient Learning of Relational Object Class Models , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[67]  Nevenka Dimitrova,et al.  Text detection for video analysis , 1999, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL'99).

[68]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[69]  Daphna Weinshall,et al.  Object class recognition by boosting a part-based model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[70]  Chiou-Ting Hsu,et al.  Region correspondence for image retrieval using graph-theoretic approach and maximum likelihood estimation , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[71]  Edward Y. Chang,et al.  On learning perceptual distance function for image retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72]  Ruud M. Bolle,et al.  Comparison of distance measures for video copy detection , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[73]  HongJiang Zhang,et al.  Scheme for visual feature-based image indexing , 1995, Electronic Imaging.

[74]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[76]  C. Dorai,et al.  Accurate Overlay Text Extraction for Digital Video Analysis , 2003 .

[77]  Harold Exton,et al.  Handbook of Hypergeometric Integrals: Theory, Applications, Tables, Computer Programs , 1978 .

[78]  Marcello Pelillo,et al.  Replicator Equations, Maximal Cliques, and Graph Isomorphism , 1998, Neural Computation.

[79]  Jun Zhang,et al.  A Markov random field model-based approach to image interpretation , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[80]  David G. Stork,et al.  Pattern Classification , 1973 .

[81]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[82]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[83]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[84]  Wei-Ying Ma,et al.  Learning similarity measure for natural image retrieval with relevance feedback , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[85]  I. Rybak,et al.  A model of attention-guided visual perception and recognition , 1998, Vision Research.

[86]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[87]  Shahram Ebadollahi,et al.  Automatic view recognition in echocardiogram videos using parts-based representation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[88]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[89]  H. C. Longuet-Higgins,et al.  An algorithm for associating the features of two images , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[90]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[91]  Edward H. Adelson,et al.  Steerable filters for early vision, image analysis, and wavelet decomposition , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[92]  Edwin R. Hancock,et al.  Deterministic search for relational graph matching , 1999, Pattern Recognit..

[93]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[95]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[96]  Edwin R. Hancock,et al.  Point pattern matching with robust spectral correspondence , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[97]  Tony Jebara,et al.  Images as bags of pixels , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[98]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[99]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[100]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[101]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[102]  Michael I. Jordan,et al.  Semidefinite methods for approximate inference on graphs with cycles , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[103]  Edwin R. Hancock,et al.  Graph Matching with Hierarchical Discrete Relaxation , 1997, NIPS.

[104]  King-Sun Fu,et al.  An Image Understanding System Using Attributed Symbolic Representation and Inexact Graph-Matching , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[106]  Stan Z. Li,et al.  A Markov random field model for object matching under contextual constraints , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[107]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[108]  M. P. C. McQueen A generalization of template matching for recognition of real objects , 1981, Pattern Recognit..

[109]  Rama Chellappa,et al.  Discriminant Analysis for Recognition of Human Face Images (Invited Paper) , 1997, AVBPA.

[110]  Norbert Krüger,et al.  Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[111]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .

[112]  Edwin R. Hancock,et al.  Inexact Graph Matching with Genetic Search , 1996, SSPR.

[113]  Jiang Gao,et al.  An adaptive algorithm for text detection from natural scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[114]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[115]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.