Universal and Adapted Vocabularies for Generic Visual Categorization

Generic visual categorization (GVC) is the pattern classification problem that consists in assigning labels to an image based on its semantic content. This is a challenging task as one has to deal with inherent object/scene variations, as well as changes in viewpoint, lighting, and occlusion. Several state-of-the-art GVC systems use a vocabulary of visual terms to characterize images with a histogram of visual word counts. We propose a novel practical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. The main novelty is that an image is characterized by a set of histograms - one per class - where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. This framework is applied to two types of local image features: low-level descriptors such as the popular SIFT and high-level histograms of word co-occurrences in a spatial neighborhood. It is shown experimentally on two challenging data sets (an in-house database of 19 categories and the PASCAL VOC 2006 data set) that the proposed approach exhibits state-of-the-art performance at a modest computational cost.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Alfredo Petrosino,et al.  Image Analysis and Processing , 2016, Springer US.

[4]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[5]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[6]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[7]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[9]  Philip C. Woodland,et al.  Speaker adaptation: techniques and challenges , 1999 .

[10]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[13]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[16]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[17]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[18]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[19]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[21]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Gabriela Csurka,et al.  Incorporating Geometry Information with Weak Classifiers for Improved Generic Visual Categorization , 2005, ICIAP.

[23]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[24]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  John Shawe-Taylor,et al.  Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels , 2005 .

[26]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[30]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Jean-Marc Odobez,et al.  Constructing Visual Models with a Latent Space Approach , 2005, SLSFS.

[32]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[33]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[34]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[35]  Frédéric Jurie,et al.  Learning Saliency Maps for Object Categorization , 2006 .

[36]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[37]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[38]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..