Effects of parallel distributed implementation on the search performance of Pittsburgh-style genetics-based machine learning algorithms

Pittsburgh-style genetics-based machine learning (GBML) algorithms have strong search ability for obtaining rule-based classifiers. However, when we apply them to data mining from large data, we need huge computation time for fitness evaluation. In our previous studies, we have proposed parallel distributed implementation of fuzzy GBML for fuzzy classifier design from large data. The basic idea of our parallel distributed implementation is to divide not only a population but also a training data set into N sub-populations and N training data subsets, respectively. A pair of a sub-population and a training data subset is assigned to each of N CPU cores in a workstation or a cluster. This dual division strategy achieved a quadratic speedup (i.e., N2 times faster than the use of a single CPU core) while maintaining the generalization ability on test data. In this paper, we apply our parallel distributed implementation to GAssist which is a non-fuzzy Pittsburgh-style GBML algorithm. We examine the effects of the number of divisions on the search ability comparing with the parallel distributed fuzzy GBML.

[1]  Sebastián Ventura,et al.  Parallel evaluation of Pittsburgh rule-based classifiers on GPUs , 2014, Neurocomputing.

[2]  Peter Ross,et al.  Dynamic Training Subset Selection for Supervised Learning in Genetic Programming , 1994, PPSN.

[3]  Hisao Ishibuchi,et al.  Handling a training dataset as a black-box model for privacy preserving in fuzzy GBML algorithms , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[4]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[5]  Jaume Bacardit,et al.  Speeding up the evaluation of evolutionary learning systems using GPGPUs , 2010, GECCO '10.

[6]  Ester Bernadó-Mansilla,et al.  Genetic-based machine learning systems are competitive for pattern recognition , 2008, Evol. Intell..

[7]  Hisao Ishibuchi,et al.  Application of Parallel Distributed Implementation to Multiobjective Fuzzy Genetics-Based Machine Learning , 2015, ACIIDS.

[8]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part I , 2014, IEEE Transactions on Evolutionary Computation.

[9]  Hisao Ishibuchi,et al.  Application of parallel distributed genetics-based machine learning to imbalanced data sets , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  Hisao Ishibuchi,et al.  Use of very small training data subsets in parallel distributed genetic fuzzy rule selection , 2010, 2010 4th International Workshop on Genetic and Evolutionary Fuzzy Systems (GEFS).

[12]  María José del Jesús,et al.  Revisiting Evolutionary Fuzzy Systems: Taxonomy, applications, new trends and challenges , 2015, Knowl. Based Syst..

[13]  Hisao Ishibuchi,et al.  Parallel Distributed Hybrid Fuzzy GBML Models With Rule Set Migration and Training Data Rotation , 2013, IEEE Transactions on Fuzzy Systems.

[14]  María José del Jesús,et al.  Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks , 2014, WIREs Data Mining Knowl. Discov..

[15]  Ujjwal Maulik,et al.  Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part II , 2014, IEEE Transactions on Evolutionary Computation.

[16]  Francisco Herrera,et al.  Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study , 2010, IEEE Transactions on Evolutionary Computation.

[17]  Martin V. Butz,et al.  Improving the Performance of a Pittsburgh Learning Classifier System Using a Default Rule , 2005, IWLCS.

[18]  Francisco Herrera,et al.  Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Hisao Ishibuchi,et al.  Rotation effects of objective functions in parallel distributed multiobjective fuzzy genetics-based machine learning , 2015, 2015 10th Asian Control Conference (ASCC).

[20]  Hisao Ishibuchi,et al.  Parallel distributed genetic fuzzy rule selection , 2008, Soft Comput..

[21]  Hisao Ishibuchi,et al.  Hybridization of fuzzy GBML approaches for pattern classification problems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Jaume Bacardit Peñarroya Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time , 2004 .

[23]  Jaume Bacardit,et al.  Evolving Multiple Discretizations with Adaptive Intervals for a Pittsburgh Rule-Based Learning Classifier System , 2003, GECCO.

[24]  Graham J. Williams,et al.  Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives [Discussion Forum] , 2014, IEEE Computational Intelligence Magazine.

[25]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[26]  Francisco Herrera,et al.  A Review of the Application of Multiobjective Evolutionary Fuzzy Systems: Current Status and Further Directions , 2013, IEEE Transactions on Fuzzy Systems.

[27]  Hisao Ishibuchi,et al.  Ensemble classifier design by parallel distributed implementation of genetic fuzzy rule selection for large data sets , 2010, IEEE Congress on Evolutionary Computation.

[28]  Tim Kovacs,et al.  Genetics-Based Machine Learning , 2012, Handbook of Natural Computing.

[29]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .