PGGP: Prototype Generation via Genetic Programming

Abstract Prototype generation (PG) methods aim to find a subset of instances taken from a large training data set, in such a way that classification performance (commonly, using a 1NN classifier) when using prototypes is equal or better than that obtained when using the original training set. Several PG methods have been proposed so far, most of them consider a small subset of training instances as initial prototypes and modify them trying to maximize the classification performance on the whole training set. Although some of these methods have obtained acceptable results, training instances may be under-exploited, because most of the times they are only used to guide the search process. This paper introduces a PG method based on genetic programming in which many training samples are combined through arithmetic operators to build highly effective prototypes. The genetic program aims to generate prototypes that maximize an estimate of the generalization performance of an 1NN classifier. Experimental results are reported on benchmark data to assess PG methods. Several aspects of the genetic program are evaluated and compared to many alternative PG methods. The empirical assessment shows the effectiveness of the proposed approach outperforming most of the state of the art PG techniques when using both small and large data sets. Better results were obtained for data sets with numeric attributes only, although the performance of the proposed technique on mixed data was very competitive as well.

[1]  Jack Koplowitz,et al.  On the relation of performance to editing in nearest neighbor rules , 1981, Pattern Recognit..

[2]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[4]  Isabelle Guyon,et al.  CSMMI: Class-Specific Maximization of Mutual Information for Action and Gesture Recognition , 2014, IEEE Transactions on Image Processing.

[5]  Christine Decaestecker,et al.  Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing , 1997, Pattern Recognit..

[6]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[7]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[8]  Utpal Garain,et al.  Prototype reduction using an artificial immune model , 2008, Pattern Analysis and Applications.

[9]  Tunga Güngör,et al.  A high performance centroid-based classification approach for language identification , 2012, Pattern Recognit. Lett..

[10]  Loris Nanni,et al.  Particle swarm optimization for prototype reduction , 2009, Neurocomputing.

[11]  Inés María Galván,et al.  AMPSO: A New Particle Swarm Method for Nearest Neighborhood Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[13]  Fernando Fernández,et al.  Evolutionary Design of Nearest Prototype Classifiers , 2004, J. Heuristics.

[14]  Tim Kovacs,et al.  Genetics-Based Machine Learning , 2012, Handbook of Natural Computing.

[15]  B. John Oommen,et al.  A brief taxonomy and ranking of creative prototype reduction schemes , 2003, Pattern Analysis & Applications.

[16]  Francisco Herrera,et al.  Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..

[17]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Hugo Jair Escalante,et al.  Genetic Programming of Prototypes for Pattern Classification , 2013, IbPRIA.

[20]  Luigi P. Cordella,et al.  Looking for Prototypes by Genetic Programming , 2006, IWICPAS.

[21]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[22]  Hugo Jair Escalante,et al.  Simultaneous generation of prototypes and features through genetic programming , 2014, GECCO.

[23]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..