Integrating hierarchical feature selection and classifier training for multi-label image annotation

It is well accepted that using high-dimensional multi-modal visual features for image content representation and classifier training may achieve more sufficient characterization of the diverse visual properties of the images and further result in higher discrimination power of the classifiers. However, training the classifiers in a high-dimensional multi-modal feature space requires a large number of labeled training images, which will further result in the problem of curse of dimensionality. To tackle this problem, a hierarchical feature subset selection algorithm is proposed to enable more accurate image classification, where the processes for feature selection and classifier training are seamlessly integrated in a single framework. First, a feature hierarchy (i.e., concept tree for automatic feature space partition and organization) is used to automatically partition high-dimensional heterogeneous multi-modal visual features into multiple low-dimensional homogeneous single-modal feature subsets according to their certain physical meanings and each of them is used to characterize one certain type of the diverse visual properties of the images. Second, principal component analysis (PCA) is performed on each homogeneous singlemodal feature subset to select the most representative feature dimensions and a weak classifier is learned simultaneously. After the weak classifiers and their representative feature dimensions are available for all these homogeneous single-modal feature subsets, they are combined to generate an ensemble image classifier and achieve hierarchical feature subset selection. Our experiments on a specific domain of natural images have also obtained very positive results.

[1]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[2]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[5]  Jianping Fan,et al.  Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers , 2006, MM '06.

[6]  Jing Hua,et al.  Region-based Image Annotation using Asymmetrical Support Vector Machine-based Multiple-Instance Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[12]  Andrew McCallum,et al.  Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning , 2005 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[15]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[16]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[17]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[18]  Wei-Ying Ma,et al.  A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieva , 2005, ICCV.

[19]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[20]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[21]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[22]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[23]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[24]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[26]  Antonio Torralba,et al.  Semantic organization of scenes using discriminant structural templates , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[27]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[28]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  Nuno Vasconcelos,et al.  Scalable discriminant feature selection for image retrieval and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[30]  Wei-Ying Ma,et al.  Iteratively clustering web images based on link and attribute reinforcements , 2005, ACM Multimedia.

[31]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[32]  Tommi S. Jaakkola,et al.  Information Regularization with Partially Labeled Data , 2002, NIPS.

[33]  Cyrus Shahabi,et al.  Yima: real-time multimedia storage and retrieval , 2002, MULTIMEDIA '02.

[34]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[36]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.