Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data

Missing values are an unavoidable issue in many real-world datasets. Classification with incomplete data has to be addressed carefully because inadequate treatment often leads to a big classification error. Interval genetic programming (IGP) is an approach to directly use genetic programming to evolve an effective and efficient classifier for incomplete data. This paper proposes a method to improve IGP for classification with incomplete data by integrating IGP with ensemble learning to build a set of classifiers. Experimental results show that the integration of IGP and ensemble learning to evolve a set of classifiers for incomplete data can achieve better accuracy than IGP alone. The proposed method is also more accurate than other common methods for classification with incomplete data.

[1]  Steven D. Brown,et al.  Comparison of five iterative imputation methods for multivariate classification , 2013 .

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[4]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[5]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[7]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[8]  John R. Koza,et al.  Genetic Programming III - Darwinian Invention and Problem Solving , 1999, Evolutionary Computation.

[9]  John R. Koza,et al.  Genetic Programming III: Darwinian Invention & Problem Solving , 1999 .

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[12]  Mengjie Zhang,et al.  An effective and efficient approach to classification with incomplete data , 2018, Knowl. Based Syst..

[13]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[14]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[15]  Mengjie Zhang,et al.  Directly evolving classifiers for missing data using genetic programming , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[16]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.