Covariance matrix self-adaptation evolution strategies and other metaheuristic techniques for neural adaptive learning

A covariance matrix self-adaptation evolution strategy (CMSA-ES) was compared with several metaheuristic techniques for multilayer perceptron (MLP)-based function approximation and classification. Function approximation was based on simulations of several 2D functions and classification analysis was based on nine cancer DNA microarray data sets. Connection weight learning by MLPs was carried out using genetic algorithms (GA–MLP), covariance matrix self-adaptation-evolution strategies (CMSA-ES–MLP), back-propagation gradient-based learning (MLP), particle swarm optimization (PSO–MLP), and ant colony optimization (ACO–MLP). During function approximation runs, input-side activation functions evaluated included linear, logistic, tanh, Hermite, Laguerre, exponential, and radial basis functions, while the output-side function was always linear. For classification, the input-side activation function was always logistic, while the output-side function was always regularized softmax. Self-organizing maps and unsupervised neural gas were used to reduce dimensions of original gene expression input features used in classification. Results indicate that for function approximation, use of Hermite polynomials for activation functions at hidden nodes with CMSA-ES–MLP connection weight learning resulted in the greatest fitness levels. On average, the most elite chromosomes were observed for MLP ($${\rm MSE}=0.4977$$), CMSA-ES–MLP (0.6484), PSO–MLP (0.7472), ACO–MLP (1.3471), and GA–MLP (1.4845). For classification analysis, overall average performance of classifiers used was 92.64% (CMSA-ES–MLP), 92.22% (PSO–MLP), 91.30% (ACO–MLP), 89.36% (MLP), and 60.72% (GA–MLP). We have shown that a reliable approach to function approximation can be achieved through application of MLP connection weight learning when the assumed function is unknown. In this scenario, the MLP architecture itself defines the equation used for solving the unknown parameters relating input and output target values. A major drawback of implementing CMSA-ES into an MLP is that when the number of MLP weights is large, the $${{\mathcal{O}}}(N^3)$$ Cholesky factorization becomes a bottleneck for performance. As an alternative, feature reduction using SOM and NG can greatly enhance performance of CMSA-ES–MLP by reducing $$N.$$ Future research into the speeding up of Cholesky factorization for CMSA-ES will be helpful in overcoming time complexity problems related to a large number of connection weights.

[1]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[2]  Roy L Johnston,et al.  Energetic, electronic, and thermal effects on structural properties of Ag-Au nanoalloys. , 2008, ACS nano.

[3]  Yong Xia,et al.  Genetic algorithm-based PCA eigenvector selection and weighting for automated identification of dementia using FDG-PET imaging. , 2008, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference.

[4]  Kenneth A. De Jong,et al.  Are Genetic Algorithms Function Optimizers? , 1992, PPSN.

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[7]  Marco Dorigo,et al.  Optimization, Learning and Natural Algorithms , 1992 .

[8]  Stefan Roth,et al.  Covariance Matrix Adaptation for Multi-objective Optimization , 2007, Evolutionary Computation.

[9]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[10]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[11]  Onur Osman,et al.  Colonic Polyp Detection in CT Colonography with Fuzzy Rule Based 3D Template Matching , 2009, Journal of Medical Systems.

[12]  Bernhard Sendhoff,et al.  Covariance Matrix Adaptation Revisited - The CMSA Evolution Strategy - , 2008, PPSN.

[13]  M. Sadeghi,et al.  Genetic algorithm for dyad pattern finding in DNA sequences. , 2009, Genes & genetic systems.

[14]  Khashayar Khorasani,et al.  Constructive feedforward neural networks using Hermite polynomial activation functions , 2005, IEEE Transactions on Neural Networks.

[15]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[16]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[17]  John H. Holland,et al.  Outline for a Logical Theory of Adaptive Systems , 1962, JACM.

[18]  Hans-Paul Schwefel,et al.  Numerical Optimization of Computer Models , 1982 .

[19]  Helena Ramalhinho Dias Lourenço,et al.  Iterated Local Search , 2001, Handbook of Metaheuristics.

[20]  S. Subramaniam,et al.  Mixed-integer nonlinear optimisation approach to coarse-graining biochemical networks. , 2009, IET systems biology.

[21]  Andrew F. Laine,et al.  Leveraging genetic algorithm and neural network in automated protein crystal recognition , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[22]  Monte Lunacek,et al.  Calibration of liquid crystal ultrafast pulse shaper with common-path spectral interferometry and application to coherent control with a covariance matrix adaptation evolutionary strategy. , 2008, The Review of scientific instruments.

[23]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[24]  Nikolaus Hansen,et al.  A Derandomized Approach to Self-Adaptation of Evolution Strategies , 1994, Evolutionary Computation.

[25]  William C. Carpenter,et al.  Guidelines for the selection of network architecture , 1997, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[26]  Leif E. Peterson,et al.  Logistic ensembles of Random Spherical Linear Oracles for microarray classification , 2009, Int. J. Data Min. Bioinform..

[27]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[28]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[29]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[30]  Kalyanmoy Deb,et al.  A Computationally Efficient Evolutionary Algorithm for Real-Parameter Optimization , 2002, Evolutionary Computation.

[31]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[32]  Keith Bush,et al.  Optimizing conductance parameters of cortical neural models via electrotonic partitions , 2005, Neural Networks.

[33]  Marco Dorigo,et al.  Ant colony optimization for continuous domains , 2008, Eur. J. Oper. Res..

[34]  Christian Igel,et al.  A computational efficient covariance matrix update and a (1+1)-CMA for evolution strategies , 2006, GECCO.

[35]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[36]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[37]  Areej M. Abu Hammad,et al.  Pharmacophore Modeling, Quantitative Structure-Activity Relationship Analysis, and Shape-Complemented in Silico Screening Allow Access to Novel Influenza Neuraminidase Inhibitors , 2009, J. Chem. Inf. Model..

[38]  É. Slezak,et al.  Density estimation with non{parametric methods ? , 1997, astro-ph/9704096.

[39]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[40]  Xin Yao,et al.  Evolving edited k-Nearest Neighbor Classifiers , 2008, Int. J. Neural Syst..

[41]  Jürgen Schmidhuber,et al.  Self-organizing nets for optimization , 2004, IEEE Transactions on Neural Networks.

[42]  Christian Igel,et al.  Evolutionary Optimization of Sequence Kernels for Detection of Bacterial Gene Starts , 2006, ICANN.

[43]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[44]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[45]  F. Glover,et al.  Handbook of Metaheuristics , 2019, International Series in Operations Research & Management Science.

[46]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[47]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.