Optimization of Classifiers using Genetic Programming

The success of pattern classification system depends on the improvement of its classification stage.The work of thesis has investigated the potential of Genetic Programming (GP) search space to optimize the performance of various classification models. In this thesis, two GP approaches are proposed. In the first approach, GP is used to optimize the performance of individual classifiers. The performance of linear classifiers and nearest neighbor classifiers is improved during GP evolution to develop a high performance numeric classifier. In second approach, component classifiers are trained on the input data and their predictions are extracted. GP search space is then used to combine the predictions of component classifiers to develop an optimal composite classifier (OCC). This composite classifier extracts useful information from its component classifiers during evolution process. In this way, the decision space of composite classifier is more informative and discriminant. Effectiveness of GP combination technique is investigated for four different types of classification models including linear classifiers, support vector machines (SVMs) classifiers, statistical classifiers and instance based nearest neighbor classifiers.The successfulness of such composite classifiers is demonstrated by performing various experiments, while using Receiver Operating Characteristics (ROC) curve as the performance measure. It is evident from the experimental results that OCC outperforms its component classifiers. It attains high margin of improvement at small feature sets. Further, it is concluded that classification models developed by heterogeneous combination of classifiers have more promising results than their homogenous combination.GP optimization technique automatically caters the selection of suitable component classifiers and model selection. Two main objectives are achieved, while using GP optimization. First, objective achieved is the development of more optimal classification models. The second one is the enhancement in the GP search strategy itself.

[1]  Javier M. Moguerza,et al.  Improving Support Vector Classification via the Combination of Multiple Sources of Information , 2004, SSPR/SPR.

[2]  Ivanoe De Falco,et al.  Discovering interesting classification rules with genetic programming , 2002, Appl. Soft Comput..

[3]  Bogdan Gabrys,et al.  Analysis of the Correlation Between Majority Voting Error and the Diversity Measures in Multiple Classifier Systems , 2001 .

[4]  Kashif Rajpoot,et al.  SVM Optimization for Hyperspectral Colon Tissue Cell Classification , 2004, MICCAI.

[5]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[6]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[7]  Michael C. Fairhurst,et al.  Genetic Algorithms for Multi-classifier System Configuration: A Case Study in Character Recognition , 2001, Multiple Classifier Systems.

[8]  Sukhdev Khebbal,et al.  Intelligent Hybrid Systems , 1994 .

[9]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[10]  William B. Langdon,et al.  Genetic Programming in Data Mining for Drug Discovery , 2005 .

[11]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[12]  Rolf Drechsler,et al.  Priorities in multi-objective optimization for genetic programming , 2001 .

[13]  William B. Langdon,et al.  Genetic Programming for Improved Receiver Operating Characteristics , 2001, Multiple Classifier Systems.

[14]  Eckart Zitzler,et al.  Evolutionary algorithms for multiobjective optimization: methods and applications , 1999 .

[15]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[16]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[17]  Bangalore S. Manjunath,et al.  Genetic Programming for Object Detection , 1996 .

[18]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[19]  D. Wolpert Combining Generalizers Using Partitions of the Learning Set , 1993 .

[20]  Gary B. Lamont,et al.  Multiobjective evolutionary algorithms: classifications, analyses, and new innovations , 1999 .

[21]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[22]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[23]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[25]  Lakhmi C. Jain,et al.  Nearest neighbor classifier: Simultaneous editing and feature selection , 1999, Pattern Recognit. Lett..

[26]  Asifullah Khan,et al.  Combination and optimization of classifiers in gender classification using genetic programming , 2005 .

[27]  William B. Langdon,et al.  Genetic programming for combining classifiers , 2001 .

[28]  George Bebis,et al.  Neural-network-based gender classification using genetic search for eigen-feature selection , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[29]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[30]  Vic Ciesielski,et al.  Representing classification problems in genetic programming , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[31]  William B. Langdon,et al.  Genetic programming for combining neural networks for drug discovery , 2002 .

[32]  Qiangfu Zhao,et al.  A Study on Efficient Generation of Decision Trees Using Genetic Programming , 2000, GECCO.

[33]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[34]  Asifullah Khan,et al.  Intelligent perceptual shaping of a digital watermark: Exploiting Characteristics of human visual system , 2006, Int. J. Knowl. Based Intell. Eng. Syst..

[35]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[36]  Glenn Fung,et al.  Data selection for support vector machine classifiers , 2000, KDD '00.

[37]  Horst Bunke,et al.  Hybrid methods in pattern recognition , 1987 .

[38]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[39]  Alex A. Freitas,et al.  A survey of evolutionary algorithms for data mining and knowledge discovery , 2003 .

[40]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[41]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[42]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[43]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[44]  Zijian Zheng,et al.  Naive Bayesian Classifier Committees , 1998, ECML.

[45]  David G. Stork,et al.  Pattern Classification , 1973 .

[46]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[47]  Suran Asitha Goonatilake,et al.  Intelligent Systems for Finance and Business , 1995 .

[48]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Bogdan Gabrys,et al.  Application of the Evolutionary Algorithms for Classifier Selection in Multiple Classifier Systems with Majority Voting , 2001, Multiple Classifier Systems.

[50]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[51]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[52]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[53]  Carlos A. Coello Coello,et al.  An updated survey of GA-based multiobjective optimization techniques , 2000, CSUR.

[54]  Andreas Stafylopatis,et al.  A Multi-SVM Classification System , 2001, Multiple Classifier Systems.

[55]  King-Sun Fu,et al.  Handbook of pattern recognition and image processing , 1986 .

[56]  Peter J. Fleming,et al.  Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[57]  Thomas G. Dietterich,et al.  Locally Adaptive Nearest Neighbor Algorithms , 1993, NIPS.

[58]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[59]  I. De Falco,et al.  An evolutionary system for automatic explicit rule extraction , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[60]  Günter Rudolph,et al.  Contemporary Evolution Strategies , 1995, ECAL.

[61]  Asifullah Khan,et al.  Combination of support vector machines using genetic programming , 2006, Int. J. Hybrid Intell. Syst..

[62]  Amanda J. C. Sharkey,et al.  Types of Multinet System , 2002, Multiple Classifier Systems.

[63]  Dan Roth,et al.  Generalization Bounds for the Area Under the ROC Curve , 2005, J. Mach. Learn. Res..

[64]  H. Iba Bagging, Boosting, and bloating in Genetic Programming , 1999 .

[65]  Asifullah Khan,et al.  OPTIMIZING PERCEPTUAL SHAPING OF A DIGITAL WATERMARK USING GENETIC PROGRAMMING , 2004 .

[66]  Peter J. Angeline,et al.  Genetic programming and emergent intelligence , 1994 .

[67]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[68]  Simon Handley,et al.  Predicting Whether Or Not a 60-Base DNA Sequence Contains a Centrally-Located Splice Site Using Genetic Programming , 1995 .

[69]  Lawrence Davis,et al.  Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm , 1991, ICGA.

[70]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[71]  Asifullah Khan,et al.  Improving performance of nearest neighborhood classifier using genetic programming , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[72]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[73]  David B. Skalak,et al.  Prototype Selection for Composite Nearest Neighbor Classifiers , 1995 .

[74]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[75]  Asifullah Khan,et al.  Intelligent combination of kernels information for improved classification , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[76]  Ian H. Witten,et al.  Stacked generalization: when does it work? , 1997, IJCAI 1997.

[77]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[78]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[79]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[80]  Mengjie Zhang,et al.  Multiclass Object Classification Using Genetic Programming , 2004, EvoWorkshops.

[81]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[82]  Venu Govindaraju,et al.  Improvements in K-Nearest Neighbor Classification , 2001, ICAPR.

[83]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[84]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[85]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[86]  William B. Langdon,et al.  Data Fusion by Intelligent Classifier Combination , 2001 .

[87]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[88]  Ludmila I. Kuncheva,et al.  Feature Subsets for Classifier Combination: An Enumerative Experiment , 2001, Multiple Classifier Systems.

[89]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[90]  Nikhil R. Pal,et al.  A novel approach to design classifiers using genetic programming , 2004, IEEE Transactions on Evolutionary Computation.

[91]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[92]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[93]  Josef Kittler,et al.  A Framework for Classifier Fusion: Is It Still Needed? , 2000, SSPR/SPR.

[94]  Jason M. Daida,et al.  Computer-assisted design of image classification algorithms: dynamic and static fitness evaluations in a scaffolded genetic programming environment , 1996 .

[95]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[96]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[97]  Ming-Hsuan Yang,et al.  Learning Gender with Support Faces , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[98]  Oscar Cordón,et al.  Evolutionary Learning of Boolean Queries by Multiobjective Genetic Programming , 2002, PPSN.

[99]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[100]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[101]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[102]  Daniel Howard,et al.  A staged genetic programming strategy for image analysis , 1999 .

[103]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[104]  Vladan Babovic,et al.  Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff - Introductory Investigations , 2000, EuroGP.

[105]  Vasant Honavar,et al.  Optimization of Classifiers Using Genetic Algorithms , 2001 .

[106]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[107]  Tin Kam Ho,et al.  Data Complexity Analysis for Classifier Combination , 2001, Multiple Classifier Systems.

[108]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[109]  Yen-Jen Oyang,et al.  Expediting model selection for Support Vector Machines based on data reduction , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[110]  Margaret A. Nemeth,et al.  Applied Multivariate Methods for Data Analysis , 1998, Technometrics.

[111]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[112]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[113]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[114]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[115]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[116]  William B. Langdon Pareto, Population Partitioning, Price and Genetic Programming , 1995 .

[117]  William B. Langdon,et al.  Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! , 1998 .