Joint appearance and locality image representation by gaussianization

A novel image representation is proposed in this thesis to capture both the appearance and locality information for image classification applications. First, we model the feature vectors, from various granularity levels including the corpus level, the image level and image patch level, in a hierarchical Bayesian framework using mixtures of Gaussians. After such a hierarchical Gaussianization, each image is represented as a Gaussian mixture model (GMM) for its appearance, and several Gaussian maps for its spatial layout. Then we extract the appearance information from the GMMparameters, and the locality information from the global and the local statistics over Gaussian maps. Finally, we employ a supervised dimension reduction technique called DAP (discriminant adaptive projection) to remove noise directions and to further enhance the discriminating power of our representation. To validate the argument that the new representation is a general representation for images and video frames, we evaluate the representation on several important applications. Firstly, we apply the new presentation to classification and regression tasks taking whole images as inputs. These tasks include object recognition, scene category classification, face recognition, age estimation, pose estimation, gender recognition, and video event recognition. Then we test it for the object detection and image parsing tasks, where the new representation takes partial images as inputs. The experimental results show that, for various types of images and tasks, the performances using the proposed representation were the best in all the applications compared with other state-of-the-art algorithms.

[1]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[2]  William M. Campbell A covariance kernel for svm language recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[6]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[7]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Bao-Liang Lu,et al.  Gender Recognition Using a Min-Max Modular Support Vector Machine , 2005, ICNC.

[10]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[14]  Zhi-Hua Zhou,et al.  Automatic Age Estimation Based on Facial Aging Patterns , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[16]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[18]  Fatih Murat Porikli,et al.  Robust License Plate Detection Using Covariance Descriptor in a Neural Network Framework , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[19]  Shahram Ebadollahi,et al.  Visual Event Detection using Multi-Dimensional Concept Dynamics , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[20]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[22]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[23]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[24]  I. Jolliffe Principal Component Analysis , 2002 .

[25]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Shuicheng Yan,et al.  Learning Auto-Structured Regressor from Uncertain Nonnegative Labels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[30]  Xun Xu,et al.  SODA-Boosting and Its Application to Gender Recognition , 2007, AMFG.

[31]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[32]  Yuxiao Hu,et al.  Learning a Person-Independent Representation for Precise 3D Pose Estimation , 2007, CLEAR.

[33]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[34]  Yu Zhang,et al.  Learning from facial aging patterns for automatic age estimation , 2006, MM '06.

[35]  Thomas S. Huang,et al.  A novel Gaussianized vector representation for natural scene categorization , 2008, 2008 19th International Conference on Pattern Recognition.

[36]  Neil A. Thacker,et al.  Robust Recognition of Scaled Shapes using Pairwise Geometric Histograms , 1995, BMVC.

[37]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[38]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Frédéric Jurie,et al.  Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[44]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[45]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[48]  Katsuhiko Sakaue,et al.  Head pose estimation by nonlinear manifold learning , 2004, ICPR 2004.

[49]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[50]  Arnold W. M. Smeulders,et al.  What is the spatial extent of an object? , 2009, CVPR.

[51]  Shuicheng Yan,et al.  Ranking with Uncertain Labels , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[52]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[53]  Florent Perronnin,et al.  A similarity measure between unordered vector sets with application to image categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[55]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[56]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[57]  Zhen Li,et al.  Spatial Gaussian Mixture Model for gender recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[58]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Ming-Hsuan Yang,et al.  Learning Gender with Support Faces , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[62]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Dong Xu,et al.  Visual Event Recognition in News Video using Kernel Methods with Multi-Level Temporal Alignment , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[66]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Svetha Venkatesh,et al.  Object labelling from human action recognition , 2003, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003. (PerCom 2003)..

[68]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[70]  Sethuraman Panchanathan,et al.  Biased Manifold Embedding: A Framework for Person-Independent Head Pose Estimation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Yun Fu,et al.  Graph embedded analysis for head pose estimation , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[72]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[73]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[74]  Roberto Cipolla,et al.  Determining the gaze of faces in images , 1994, Image Vis. Comput..

[75]  Bo Zhang,et al.  An efficient and effective region-based image retrieval framework , 2004, IEEE Transactions on Image Processing.

[76]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[78]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[79]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[80]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[81]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[82]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[83]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[84]  Gang Hua,et al.  A robust elastic and partial matching metric for face recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[85]  Ming Liu,et al.  Regression from patch-kernel , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Tsuhan Chen,et al.  A GMM parts based face representation for improved verification through relevance adaptation , 2004, CVPR 2004.

[87]  Tsuhan Chen,et al.  Learning Patch Dependencies for Improved Pose Mismatched Face Verification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[88]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[89]  Peyman Milanfar,et al.  Kernel Regression for Image Processing and Reconstruction , 2007, IEEE Transactions on Image Processing.

[90]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[91]  Peyman Milanfar,et al.  Detection of human actions from a single example , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[92]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[93]  Thomas S. Huang,et al.  Face age estimation using patch-based hidden Markov model supervectors , 2008, 2008 19th International Conference on Pattern Recognition.

[94]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.