On the Use of Evolutionary Algorithms in Data Mining

With computers becoming more pervasive, disks becoming cheaper, and sensors becoming ubiquitous, we are collecting data at an ever-increasing pace. However, it is far easier to collect the data than to extract useful information from it. Sophisticated techniques, such as those developed in the multi-disciplinary field of data mining, are increasingly being applied to the analysis of these datasets in commercial and scientific domains. As the problems become larger and more complex, researchers are turning to heuristic techniques to complement existing approaches. This survey paper examines the role that evolutionary algorithms (EAs) can play in various stages of data mining. We consider data mining as the end-to-end process of finding patterns starting with raw data. The paper focuses on the topics of feature extraction, feature selection, classification, and clustering, and surveys the state of the art in the application of evolutionary algorithms to these areas. We examine the use of evolutionary algorithms both in isolation and in combination with other algorithms including neural networks, and decision trees. The paper concludes with a summary of open research problems and opportunities for the future.

[1]  Stewart W. Wilson Mining Oblique Data with XCS , 2000, IWLCS.

[2]  Stewart W. Wilson,et al.  Learning Classifier Systems, From Foundations to Applications , 2000 .

[3]  William B. Langdon,et al.  Application of Genetic Programming to Induction of Linear Classification Trees , 2000, EuroGP.

[4]  Larry R. Medsker,et al.  Genetic Algorithms and Neural Networks , 1995 .

[5]  J. Urgen Branke Evolutionary Algorithms for Neural Network Design and Training , 1995 .

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  Peter J. B. Hancock,et al.  Recombination Operators for the Design of Neural Nets by Genetic Algorithm , 1992, Parallel Problem Solving from Nature.

[8]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[9]  Olli Nevalainen,et al.  Efficient clustering with a self-adaptive genetic algorithm , 2000, GECCO.

[10]  Robert F. Harrison,et al.  Optimization and training of feedforward neural networks by genetic algorithms , 1991 .

[11]  Worthy N. Martin,et al.  Genetic Algorithms for Feature Selection for Counterpropagation Networks , 1990 .

[12]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[13]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[14]  D.E. Goldberg,et al.  Classifier Systems and Genetic Algorithms , 1989, Artif. Intell..

[15]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Jeffrey L. Elman,et al.  Learning and Evolution in Neural Networks , 1994, Adapt. Behav..

[17]  A. B. Dobrzenieckiy,et al.  Segmentation of 3d Medical Images through Genetically-optimized Contour-tracking Algorithms , 1997 .

[18]  Stewart W. Wilson State of XCS Classifier System Research , 1999, Learning Classifier Systems.

[19]  J. Fitzpatrick,et al.  Adaptive search space scaling in digital image registration. , 1989, IEEE transactions on medical imaging.

[20]  J. David Schaffer,et al.  Proceedings of the third international conference on Genetic algorithms , 1989 .

[21]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[22]  Nikolay I. Nikolaev,et al.  Inductive Genetic Programming with Decision Trees , 1998, Intell. Data Anal..

[23]  Mohamed Slimane,et al.  On Using Interactive Genetic Algorithms for Knowledge Discovery in Databases , 1997, ICGA.

[24]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[25]  L. Darrell Whitley,et al.  Genetic algorithms and neural networks: optimizing connections and connectivity , 1990, Parallel Comput..

[26]  Andreas Zell,et al.  Evolving Task Specific Image Operator , 1999, EvoWorkshops.

[27]  W. M. Jenkins,et al.  Genetic Algorithms and Neural Networks , 1999, Neural Networks in the Analysis and Design of Structures.

[28]  Johan A. K. Suykens,et al.  Genetic Weight Optimization of a Feedforward Neural Network Controller , 1993 .

[29]  Bir Bhanu,et al.  Adaptive image segmentation using a genetic algorithm , 1989, IEEE Transactions on Systems, Man, and Cybernetics.

[30]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[31]  Giandomenico Spezzano,et al.  Genetic Programming and Simulated Annealing: A Hybrid Method to Evolve Decision Trees , 2000, EuroGP.

[32]  Thomas P. Caudell,et al.  Parametric Connectivity: Training of Constrained Networks using Genetic Algorithms , 1989, ICGA.

[33]  Chandrika Kamath,et al.  Using Evolutionary Algorithms to Induce Oblique Decision Trees , 2000, GECCO.

[34]  Stephen F. Smith,et al.  A learning system based on genetic adaptive algorithms , 1980 .

[35]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[36]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[37]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[38]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[39]  Olli Nevalainen,et al.  Genetic Algorithms for Large-Scale Clustering Problems , 1997, Comput. J..

[40]  Nicholas J. Radcliffe,et al.  Genetic neural networks on MIMD computers , 1992 .

[41]  C. A. Murthy,et al.  In search of optimal clusters using genetic algorithms , 1996, Pattern Recognit. Lett..

[42]  Kenneth DeJong,et al.  Feature Space Transformation Using Genetic Algorithms , 1998, IEEE Intell. Syst..

[43]  David E. Goldberg,et al.  A Critical Review of Classifier Systems , 1989, ICGA.

[44]  Abdesselam Bouzerdoum,et al.  Automatic selection of features for classification using genetic programming , 1996, 1996 Australian New Zealand Conference on Intelligent Information Systems. Proceedings. ANZIIS 96.

[45]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[46]  Stephen F. Smith,et al.  Flexible Learning of Problem Solving Heuristics Through Adaptive Search , 1983, IJCAI.

[47]  Pietro Perona,et al.  Learning to Recognize Volcanoes on Venus , 1998, Machine Learning.

[48]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[49]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[50]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[51]  Jason M. Daida,et al.  Genetic Programming for Automatic Target Classification and Recognition , 1998, Evolutionary Programming.

[52]  M. Narasimha Murty,et al.  Clustering with evolution strategies , 1994, Pattern Recognit..

[53]  Patrick K. Simpson,et al.  Dynamic Feature Set Training of Neural Nets for Classification , 1995, Evolutionary Programming.

[54]  Suchendra M. Bhandarkar,et al.  An edge detection technique using genetic algorithm-based optimization , 1994, Pattern Recognit..

[55]  Riccardo Poli,et al.  Genetic Programming for Feature Detection and Image Segmentation , 1996, Evolutionary Computing, AISB Workshop.

[56]  Richard J. Enbody,et al.  Further Research on Feature Selection and Classification Using Genetic Algorithms , 1993, ICGA.

[57]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[58]  K. De Jong,et al.  Using Genetic Algorithms for Concept Learning , 2004, Machine Learning.

[59]  Kendall E. Nygard,et al.  Improving the Performance of Genetic Algorithms in Automated Discovery of Parameters , 1990, ML.

[60]  L. Darrell Whitley,et al.  Genetic Approach to Feature Selection for Ensemble Creation , 1999, GECCO.

[61]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[62]  Lawrence Davis,et al.  Training Feedforward Neural Networks Using Genetic Algorithms , 1989, IJCAI.

[63]  Arthur R. Weeks Fundamentals of electronic image processing , 1996, SPIE/IEEE series on imaging science and engineering.

[64]  A. Skinner,et al.  Neural networks in computational materials science: training algorithms , 1995 .

[65]  Richard K. Belew,et al.  Evolving networks: using the genetic algorithm with connectionist learning , 1990 .

[66]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[67]  Filippo Neri,et al.  Search-Intensive Concept Induction , 1995, Evolutionary Computation.

[68]  L. Darrell Whitley,et al.  Optimizing Neural Networks Using FasterMore Accurate Genetic Search , 1989, ICGA.

[69]  Edward J. Delp,et al.  A comparative cost function approach to edge detection , 1989, IEEE Trans. Syst. Man Cybern..

[70]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[71]  Melanie Mitchell,et al.  Investigation of image feature extraction by a genetic algorithm , 1999, Optics + Photonics.