A parallel island model for biogeography-based classification rule mining in julia

In this paper, we present a distributed island model implementation of biogeography-based optimization for classification rule mining (island BBO-RM). Island BBO-RM is an evolutionary algorithm for rule mining that uses Pittsburgh style classification rule encoding, which represents an entire ruleset (classifier) as a single chromosome. Our algorithm relies on biogeography-based optimization (BBO), an optimization technique that is inspired by species migration pattern between habitats. Biogeography-based optimization has been reported to perform well in various applications ranging from function optimization to image classification. A major limitation of evolutionary rule mining algorithms is their high computational cost and running time. To address this challenge, we have applied a distributed island model to parallelize the rule extraction phase via BBO. We have explored several different migration topologies and data windowing techniques. Our algorithm is implemented in Julia, a dynamic programming language designed for high-performance and parallel computation. Our results show that our distributed implementation is able to achieve considerable speedups when compared to a serial implementation. Without data windowing, we obtain speedups up to a factor of nine without a loss of classification accuracy. With data windowing, we obtain speedups up to a factor of 30 with a small loss of accuracy in some cases.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Zdzislaw Pawlak,et al.  Rough Set Theory and its Applications to Data Analysis , 1998, Cybern. Syst..

[3]  Xavier Llorà,et al.  Observer-invariant histopathology using genetics-based machine learning , 2009, Natural Computing.

[4]  Rosa Pérez Perdomo THE IMPACT OF MIGRATION , 2008 .

[5]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[6]  Priyanka Sharma,et al.  Discovery of Classification Rules Using Distributed Genetic Algorithm , 2015 .

[7]  Effat Farhana,et al.  Biogeography-based rule mining for classification , 2017, GECCO.

[8]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[9]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[11]  K. G. Srinivasa,et al.  A self-adaptive migration model genetic algorithm for data mining applications , 2007, Inf. Sci..

[12]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[13]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[14]  Erik D. Goodman,et al.  Coarse-grain parallel genetic algorithms: categorization and new approach , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[15]  Qidi Wu,et al.  A survey of biogeography-based optimization , 2017, Neural Computing and Applications.

[16]  Alex S. Fukunaga,et al.  Distributed island-model genetic algorithms using heterogeneous parameter settings , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  AlbaEnrique,et al.  Analyzing synchronous and asynchronous parallel distributed genetic algorithms , 2001 .

[19]  Dan Simon,et al.  Biogeography-Based Optimization , 2022 .

[20]  Kwok Yip Szeto,et al.  Topological Effects on the Performance of Island Model of Parallel Genetic Algorithm , 2013, IWANN.

[21]  Erick Cantú-Paz Designing efficient master-slave parallel genetic algorithms , 1997 .

[22]  Ivan Sekaj,et al.  Robust Parallel Genetic Algorithms with Re-initialisation , 2004, PPSN.

[23]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[24]  Enrique Alba,et al.  Parallel evolutionary algorithms can achieve super-linear performance , 2002, Inf. Process. Lett..

[25]  Kenneth DeJong,et al.  Inductive Learning of Decision Rules from Attribute-Based Examples : A Knowledge-Intensive Genetic Algorithm Approach , 2010 .

[26]  Kosmas Kapis,et al.  Enhanced Anomaly Intrusion Detection System for Mobile Ad Hoc Networks , 2018 .

[27]  Enrique Alba,et al.  Analyzing synchronous and asynchronous parallel distributed genetic algorithms , 2001, Future Gener. Comput. Syst..

[28]  Erick Cantu-paz,et al.  Implementing Fast and Flexible Parallel Genetic Algorithms , 1998, Practical Handbook of Genetic Algorithms.

[29]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[30]  Tien-Tsin Wong,et al.  Parallel evolutionary algorithms on graphics processing unit , 2005, 2005 IEEE Congress on Evolutionary Computation.

[31]  Erick Cantú-Paz,et al.  Migration Policies, Selection Pressure, and Parallel Evolutionary Algorithms , 2001, J. Heuristics.

[32]  Mostafa Zandieh,et al.  A new biogeography-based optimization (BBO) algorithm for the flexible job shop scheduling problem , 2012 .

[33]  Dirk Sudholt,et al.  Experimental supplements to the theoretical analysis of migration in the Island model , 2010, PPSN 2010.

[34]  L. Darrell Whitley,et al.  GENITOR II: a distributed genetic algorithm , 1990, J. Exp. Theor. Artif. Intell..

[35]  Wilfried Sihn,et al.  Parallel Evolutionary Algorithms , 2002, ESM.

[36]  Erick Cantú-Paz Designing Efficient and Accurate Parallel Genetic Algorithms , 1999 .

[37]  Erick Cantú-Paz,et al.  A Survey of Parallel Genetic Algorithms , 2000 .

[38]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[39]  Prabhat,et al.  Artificial Neural Network , 2018, Encyclopedia of GIS.

[40]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[41]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[42]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[43]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.

[44]  P. K. Chattopadhyay,et al.  Biogeography-Based Optimization for Different Economic Load Dispatch Problems , 2010, IEEE Transactions on Power Systems.

[45]  Irma R. Andalon-Garcia,et al.  Performance comparison of three topologies of the island model of a parallel genetic algorithm implementation on a cluster platform , 2012, CONIELECOMP 2012, 22nd International Conference on Electrical Communications and Computers.

[46]  Parminder Singh,et al.  Biogeography based Satellite Image Classification , 2009, ArXiv.

[47]  Hisao Ishibuchi,et al.  Parallel Distributed Hybrid Fuzzy GBML Models With Rule Set Migration and Training Data Rotation , 2013, IEEE Transactions on Fuzzy Systems.

[48]  Darrell Whitley,et al.  The Island Model Genetic Algorithm: On Separability, Population Size and Convergence , 2015, CIT 2015.

[49]  Dario Izzo,et al.  On the impact of the migration topology on the Island Model , 2010, Parallel Comput..