Schema Theory-Based Data Engineering in Gene Expression Programming for Big Data Analytics

Gene expression programming (GEP) is a data driven evolutionary technique that well suits for correlation mining. Parallel GEPs are proposed to speed up the evolution process using a cluster of computers or a computer with multiple CPU cores. However, the generation structure of chromosomes and the size of input data are two issues that tend to be neglected when speeding up GEP in evolution. To fill the research gap, this paper proposes three guiding principles to elaborate the computation nature of GEP in evolution based on an analysis of GEP schema theory. As a result, a novel data engineered GEP is developed which follows closely the generation structure of chromosomes in parallelization and considers the input data size in segmentation. Experimental results on two data sets with complementary features show that the data engineered GEP speeds up the evolution process significantly without loss of accuracy in data correlation mining. Based on the experimental tests, a computation model of the data engineered GEP is further developed to demonstrate its high scalability in dealing with potential big data using a large number of CPU cores.

[1]  Zhengwen Huang,et al.  Enhanced Gene Expression Programming for signal-background discrimination in particle physics , 2009 .

[2]  Graham Kendall,et al.  A Dynamic Multiarmed Bandit-Gene Expression Programming Hyper-Heuristic for Combinatorial Optimization Problems , 2015, IEEE Transactions on Cybernetics.

[3]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[4]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[5]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[6]  H. S. Lopes,et al.  A GENE EXPRESSION PROGRAMMING SYSTEM FOR TIME SERIES MODELING , 2004 .

[7]  Wentong Cai,et al.  Self-Learning Gene Expression Programming , 2016, IEEE Transactions on Evolutionary Computation.

[8]  Henri Luchian,et al.  Symbolic regression on noisy data with genetic and gene expression programming , 2005, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05).

[9]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[10]  Changjie Tang,et al.  Time Series Prediction Based on Gene Expression Programming , 2004, WAIM.

[11]  Weimin Xiao,et al.  Evolving accurate and compact classification rules with gene expression programming , 2003, IEEE Trans. Evol. Comput..

[12]  V.I. Litvinenko,et al.  Combining Clonal Selection Algorithm and Gene Expression Programming for Time Series Prediction , 2005, 2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications.

[13]  Almoataz Y. Abdelaziz,et al.  Gene expression programming for power system static security assessment , 2012 .

[14]  bhfgkicmjlaedMD,et al.  Combinatorial Optimization by Gene Expression Programming: Inversion Revisited , 2002 .

[15]  Lixin Ding,et al.  Asynchronous Distributed Parallel Gene Expression Programming Based on Estimation of Distribution Algorithm , 2008, 2008 Fourth International Conference on Natural Computation.

[16]  Cândida Ferreira,et al.  Discovery of the Boolean Functions to the Best Density-Classification Rules Using Gene Expression Programming , 2002, EuroGP.

[17]  Stewart W. Wilson Classifier Conditions Using Gene Expression Programming , 2008, IWLCS.

[18]  Riccardo Poli,et al.  Schema Theory for Genetic Programming with One-Point Crossover and Point Mutation , 1997, Evolutionary Computation.

[19]  Heitor Silvério Lopes,et al.  EGIPSYS: AN ENHANCED GENE EXPRESSION PROGRAMMING APPROACH FOR SYMBOLIC REGRESSION PROBLEMS † , 2004 .

[20]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[21]  Weimin Xiao,et al.  Prefix Gene Expression Programming , 2005 .

[22]  John Holland,et al.  Adaptation in Natural and Artificial Sys-tems: An Introductory Analysis with Applications to Biology , 1975 .

[23]  Lipo Wang,et al.  Gene expression programming for induction of finite transducer , 2009, 2009 7th International Conference on Information, Communications and Signal Processing (ICICS).

[24]  John R. Koza,et al.  Genetic programming as a means for programming computers by natural selection , 1994 .

[25]  Huifang Cheng,et al.  The Research on Evolution Schema Theorem on Gene Expression Programming , 2012 .

[26]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[27]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[28]  M. Marghny,et al.  Extracting Logical Classi cation Rules With Gene Expression Programming : Microarray Case Study , 2005 .

[29]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part I , 2003, Evolutionary Computation.

[30]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[31]  Pınar Tüfekci,et al.  Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods , 2014 .

[32]  Zhengwen Huang,et al.  Schema theory for gene expression programming , 2014 .

[33]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[34]  Graham Kendall,et al.  Automatic Design of a Hyper-Heuristic Framework With Gene Expression Programming for Combinatorial Optimization Problems , 2015, IEEE Transactions on Evolutionary Computation.

[35]  L. Teodorescu,et al.  High energy physics data analysis with gene expression programming , 2005, IEEE Nuclear Science Symposium Conference Record, 2005.

[36]  Yue Jiang,et al.  Parallel Niche Gene Expression Programming Based on General Multi-core Processor , 2010, 2010 International Conference on Artificial Intelligence and Computational Intelligence.

[37]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part II , 2003, Evolutionary Computation.

[38]  Maozhen Li,et al.  Optimizing hadoop parameter settings with gene expression programming guided PSO , 2017, Concurr. Comput. Pract. Exp..

[39]  L. Teodorescu,et al.  Gene Expression Programming Approach to Event Selection in High Energy Physics , 2006, IEEE Transactions on Nuclear Science.

[40]  D. Novillo OpenMP and automatic parallelization in GCC Diego , 2006 .

[41]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[42]  Jeffrey Scott Vitter,et al.  An efficient algorithm for sequential random sampling , 1987, TOMS.