Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers

The performance of image classifiers largely depends on two inter-related issues:(1)suitable frameworks for image content representation and automatic feature extraction;(2) effective algorithms for image classifier training and feature subset selection. To address the first issue, a multiresolution grid-based framework is proposed for image content representation and feature extraction to bypass the time-consuming and erroneous process for image segmentation. To address the second issue, a hierarchical boosting algorithm is proposed by incorporating feature hierarchy and boosting to scale up SVM image classifier training in high-dimensional feature space. The high-dimensional multi-modal heterogeneous visual features are partitioned into multiple low-dimensional single-modal homogeneous feature subsets and each of them characterizes certain visual property of images. For each homogeneous feature subset, principal component analysis (PCA)is performed to exploit the feature correlations and a weak classifier is learned simultaneously. After the weak classifiers for different feature subsets and grid sizes are available, they are combined to boost an optimal classifier for the given object class or image concept, and the most representative feature subsets and grid sizes are selected. Our experiments on a specific domain of natural images have obtained very positive results.

[1]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[2]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[3]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[6]  John R. Smith,et al.  Image Classification and Querying Using Composite Region Templates , 1999, Comput. Vis. Image Underst..

[7]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[8]  Yihong Gong Advancing content-based image retrieval by exploiting image color and region features , 1999, Multimedia Systems.

[9]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[12]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[14]  William I. Grosky,et al.  Negotiating the semantic gap: from feature maps to semantic landscapes , 2001, Pattern Recognit..

[15]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[16]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Rainer Lienhart,et al.  Classifying images on the web automatically , 2002, J. Electronic Imaging.

[18]  Mingjing Li,et al.  Boosting image orientation detection with indoor vs. outdoor classification , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[19]  Rong Jin,et al.  Using a probabilistic source model for comparing images , 2002, Proceedings. International Conference on Image Processing.

[20]  Wei-Ying Ma,et al.  Learning and inferring a semantic space from user's relevance feedback for image retrieval , 2002, MULTIMEDIA '02.

[21]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[23]  Thomas Serre,et al.  Hierarchical classification and feature reduction for fast face detection with support vector machines , 2003, Pattern Recognit..

[24]  Kien A. Hua,et al.  Image Retrieval Based on Regions of Interest , 2003, IEEE Trans. Knowl. Data Eng..

[25]  Shih-Fu Chang,et al.  Image classification using multimedia knowledge networks , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[26]  Rainer Lienhart,et al.  Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection , 2003, DAGM-Symposium.

[27]  Chitra Dorai,et al.  Bridging the semantic gap with computational media aesthetics , 2003, IEEE MultiMedia.

[28]  Edward Y. Chang,et al.  Confidence-based dynamic ensemble for image annotation and semantics discovery , 2003, MULTIMEDIA '03.

[29]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[31]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[32]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[33]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[34]  S. Li,et al.  Real-time face detection using boosting in hierarchical feature spaces , 2004, ICPR 2004.

[35]  Jiebo Luo,et al.  Improved scene classification using efficient low-level features and semantic cues , 2004, Pattern Recognit..

[36]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[37]  Jiebo Luo,et al.  Image transform bootstrapping and its applications to semantic scene classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Edward Y. Chang,et al.  Exploiting Geometry for Support Vector Machine Indexing , 2005, SDM.

[39]  Ying Li,et al.  Creating MAGIC: system for generating learning object metadata for instructional content , 2005, MULTIMEDIA '05.

[40]  Jianping Fan,et al.  Learning the semantics of images by using unlabeled samples , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Duy-Dinh Le,et al.  An Efficient Feature Selection Method for Object Detection , 2005, ICAPR.

[42]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Thomas Serre,et al.  Hierarchical Classification and Feature Reduction for Fast Face Detection , 2005 .