Adaptive Iterative Learning for Classification based on Feature Selection and Combination Voting

Feature selection is an active research area in machine learning for high dimensional dataset analysis. The idea is to perform the learning process solely on the top ranked feature spaces instead of the entire original feature space, and therefore to improve the understanding of the inherent characteristics of such dataset as well as reduce the computational cost. While most of the research efforts are focused on how to select the proper features for machine learning, we studied the following important problem in this paper: can the "unimportant features" (low rank features) also provide useful information to improve the overall learning capability? In this paper, we proposed an adaptive iterative learning mechanism based on feature selection and combination voting (AdaFSCV) to address this issue. Unlike the conventional way of discarding the unselected low rank features, we iteratively build classifiers in those feature spaces as well. Such iterative process will adaptively learn information in different feature spaces, and automatically stop when one classify can not provide better information than a random guess. Finally, a probability voting algorithm is proposed to combine all the votes from different classifiers to provide the final prediction results. Simulation results on the MNIST database of handwritten digits show this method can improve the classification accuracy and robustness with certain levels of trade-off of the computational cost.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[3]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  C. Domeniconi,et al.  An Evaluation of Gene Selection Methods for Multi-class Microarray Data Classification , 2004 .

[7]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[8]  Huan Liu,et al.  A selective sampling approach to active feature selection , 2004, Artif. Intell..

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[12]  Xuesong Lu,et al.  Significance of Gene Ranking for Classification of Microarray Samples , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[19]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[20]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[21]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Tin Kam Ho,et al.  MULTIPLE CLASSIFIER COMBINATION: LESSONS AND NEXT STEPS , 2002 .

[23]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Xiaoming Xu,et al.  A Wrapper for Feature Selection Based on Mutual Information , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[25]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[26]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[27]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[29]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.