A FAST GENETIC ALGORITHM FOR MINING CLASSIFICATION RULES IN LARGE DATASETS

Nowadays data repositories are huge and are extremely large. Building a rule based classification model for these huge data sets using Genetic Algorithm becomes an extremely complex process. This is because during the learning process several passes are made over the training data set by the Genetic Algorithm and this makes it extensively I/O intensive and unsuitable. One way to solve this problem is to build the model incrementally. This paper proposes an incremental Genetic Algorithm that builds the rule based classification model in a fine granular manner by independently evolving tiny components based on the evolution of the data set which reduces the learning cost and thus making it scalable to large data sets.

[1]  Steven Guan,et al.  An incremental approach to genetic-algorithms-based classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Sattar Hashemi,et al.  Adapted One-versus-All Decision Trees for Data Stream Classification , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[4]  Poonam Garg,et al.  A Comparison between Memetic algorithm and Genetic algorithm for the cryptanalysis of Simplified Data Encryption Standard algorithm , 2010, ArXiv.

[5]  T. Venkat Narayana Rao,et al.  Genetic Algorithms and Programming-An Evolutionary Methodology , 2010 .

[6]  Ashish Tiwari,et al.  A greedy genetic algorithm for the quadratic assignment problem , 2000, Comput. Oper. Res..

[7]  Xavier Llorà,et al.  Scaling Genetic Algorithms Using MapReduce , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[8]  Cezary Z. Janikow,et al.  A knowledge-intensive genetic algorithm for supervised learning , 1993, Machine Learning.

[9]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[10]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[11]  Kenneth A. De Jong,et al.  Learning Concept Classification Rules Using Genetic Algorithms , 1991, IJCAI.

[12]  D. H. Widyantoro,et al.  An entropy-based adaptive genetic algorithm for learning classification rules , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13]  Marek Kretowski,et al.  Discovery of Decision Rules from Databases: An Evolutionary Approach , 1998, PKDD.

[14]  Xian-Jun Shi,et al.  A Genetic Algorithm-Based Approach for Classification Rule Discovery , 2008, 2008 International Conference on Information Management, Innovation Management and Industrial Engineering.

[15]  Kenneth A. De Jong,et al.  Using genetic algorithms for concept learning , 1993, Machine Learning.

[16]  Stephen F. Smith,et al.  Competition-based induction of decision models from examples , 1993, Machine Learning.

[17]  Rajib Mall,et al.  Predictive and comprehensible rule discovery using a multi-objective genetic algorithm , 2006, Knowl. Based Syst..

[18]  Alfonsas Misevicius A fast hybrid genetic algorithm for the quadratic assignment problem , 2006, GECCO '06.

[19]  H.S. Lopes,et al.  A parallel genetic algorithm for rule discovery in large databases , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[20]  K. De Jong,et al.  Using Genetic Algorithms for Concept Learning , 2004, Machine Learning.

[21]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[22]  Shengzhong Feng,et al.  A Fast Hybrid Genetic Algorithm in Heterogeneous Computing Environment , 2009, 2009 Fifth International Conference on Natural Computation.

[23]  Dirk Thierens,et al.  Scalability Problems of Simple Genetic Algorithms , 1999, Evolutionary Computation.

[24]  Philip S. Yu,et al.  Classifying Data Streams with Skewed Class Distributions and Concept Drifts , 2008, IEEE Internet Computing.