Kaizen Programming for Feature Construction for Classification

A data set for classification is commonly composed of a set of features defining the data space representation and one attribute corresponding to the instances’ class. A classification tool has to discover how to separate classes based on features, but the discovery of useful knowledge may be hampered by inadequate or insufficient features. Pre-processing steps for the automatic construction of new high-level features proposed to discover hidden relationships among features and to improve classification quality. Here we present a new tool for high-level feature construction: Kaizen Programming. This tool can construct many complementary/dependent high-level features simultaneously. We show that our approach outperforms related methods on well-known binary-class medical data sets using a decision-tree classifier, achieving greater accuracy and smaller trees.

[1]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[2]  今井 正明,et al.  Kaizen (Ky'zen) : the key to Japan's competitive success , 1986 .

[3]  George D. Smith,et al.  Evolutionary Feature Construction Using Information Gain and Gini Index , 2004, EuroGP.

[4]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[5]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[6]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[7]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[8]  Nawwaf N. Kharma,et al.  Advances in Detecting Parkinson's Disease , 2010, ICMB.

[9]  Alex Alves Freitas A Review of evolutionary Algorithms for Data Mining , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[12]  Mark Johnston,et al.  Feature Construction and Dimension Reduction Using Genetic Programming , 2007, Australian Conference on Artificial Intelligence.

[13]  Qing Zhang,et al.  Feature extraction and dimensionality reduction by genetic programming based on the Fisher criterion , 2008, Expert Syst. J. Knowl. Eng..

[14]  Wolfgang Banzhaf,et al.  A hierarchical cooperative evolutionary algorithm , 2010, GECCO '10.

[15]  Wolfgang Banzhaf,et al.  Rethinking multilevel selection in genetic programming , 2011, GECCO '11.

[16]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[17]  Halina Kwasnicka,et al.  Feature Set Reduction by Evolutionary Selection and Construction , 2010, KES-AMSTA.

[18]  Vinicius Veloso de Melo,et al.  Kaizen programming , 2014, GECCO.

[19]  Ioannis G. Tsoulos,et al.  Selecting and constructing features using grammatical evolution , 2008, Pattern Recognit. Lett..

[20]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[21]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  John Elder,et al.  Handbook of Statistical Analysis and Data Mining Applications , 2009 .

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Wolfgang Banzhaf,et al.  Evolving Teams of Predictors with Linear Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[26]  Norman P. Bresky,et al.  Tools and Methods for the Improvement of Quality , 1990 .

[27]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[28]  Danh V. Nguyen,et al.  On partial least squares dimension reduction for microarray-based classification: a simulation study , 2004, Comput. Stat. Data Anal..

[29]  Larry Bull,et al.  Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.