Ensemble Feature Selection Based on Contextual Merit and Correlation Heuristics

Recent research has proven the benefits of using ensembles of classifiers for classification problems. Ensembles of diverse and accurate base classifiers are constructed by machine learning methods manipulating the training sets. One way to manipulate the training set is to use feature selection heuristics generating the base classifiers. In this paper we examine two of them: correlation-based and contextual merit -based heuristics. Both rely on quite similar assumptions concerning heterogeneous classification problems. Experiments are considered on several data sets from UCI Repository. We construct fixed number of base classifiers over selected feature subsets and refine the ensemble iteratively promoting diversity of the base classifiers and relying on global accuracy growth. According to the experimental results, contextual merit -based ensemble outperforms correlation-based ensemble as well as C4.5. Correlation-based ensemble produces more diverse and simple base classifiers, and the iterations promoting diversity have not so evident effect as for contextual merit -based ensemble.

[1]  Claire Cardie,et al.  Examining Locally Varying Weights for Nearest Neighbor Algorithms , 1997, ICCBR.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Salvatore J. Stolfo,et al.  Pruning Classifiers in a Distributed Meta-Learning System , 1998 .

[4]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[5]  Kagan Tumer,et al.  Dimensionality Reduction Through Classifier Ensembles , 1999 .

[6]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[7]  Alexey Tsymbal,et al.  Local feature selection for heterogeneous problems , 2000 .

[8]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[9]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[10]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[11]  Wei Fan,et al.  Using Conflicts Among Multiple Base Classifiers to Measure the Performance of Stacking , 1999 .

[12]  Roberto Battiti,et al.  Democracy in neural nets: Voting schemes for classification , 1994, Neural Networks.

[13]  Edwin P. D. Pednault,et al.  Decomposition of Heterogeneous Classification Problems , 1997, IDA.

[14]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[15]  Alexey Tsymbal,et al.  Ensemble Feature Selection Based on the Contextual Merit , 2001, DaWaK.

[16]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[18]  David J. Hand,et al.  Advances in Intelligent Data Analysis , 2000, Lecture Notes in Computer Science.

[19]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[20]  Se June Hong,et al.  Use of Contextaul Information for Feature Ranking and Discretization , 1997, IEEE Trans. Knowl. Data Eng..