Feature Selection Using Correlation and Reliability Based Scoring Metric for Video Semantic Detection

Content-based multimedia retrieval faces many challenges such as semantic gap, imbalanced data, and varied qualities of the media. Feature selection as a component of the retrieval process plays an important role. The aim of feature selection is to identify a subset of features by removing irrelevant or redundant features. An effective subset of features can not only improve model performance and reduce computational complexity, but also enhance semantic interpretability. To achieve these objectives, in this paper, a novel metric that integrates the correlation and reliability information between each feature and each class obtained from Multiple Correspondence Analysis (MCA) is proposed to score the features for feature selection. Based on these scores, a ranked list of features can be generated and different selection criteria can be adopted to select a subset of features. To evaluate the proposed framework, four other well-known feature selection methods, namely information gain, chi-square measure, correlation-based feature selection, and relief are compared with the proposed method over five popular classifiers using the benchmark data from TRECVID 2009 high-level feature extraction task. The results show that the proposed method outperforms the other methods in terms of classification accuracy, the size of feature subspace, and the ability to capture the semantic information.

[1]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[3]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[4]  Shu-Ching Chen,et al.  Correlation-Based Video Semantic Concept Detection Using Multiple Correspondence Analysis , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[9]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[10]  Mei-Ling Shyu,et al.  Effective Feature Space Reduction with Imbalanced Data for Semantic Concept Detection , 2008, 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (sutc 2008).

[11]  D. Lindley A STATISTICAL PARADOX , 1957 .

[12]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Text classification and Naive Bayes , 2008 .

[13]  Stéphane Marchand-Maillet,et al.  Can feature information interaction help for information fusion in multimedia problems? , 2009, Multimedia Tools and Applications.

[14]  Wei-Pang Yang,et al.  A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..

[15]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[16]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[17]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[18]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[19]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[20]  Tomás Aluja,et al.  Book review: Multiple correspondence analysis and related methods. Greenacre, M. and Blasius, J. Chapman & Hall/CRC, 2006. , 2006 .

[21]  Min Chen,et al.  Semantic event detection via multimodal data mining , 2006, IEEE Signal Processing Magazine.

[22]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..