Integrated Evolutionary Learning: An Artificial Intelligence Approach to Joint Learning of Features and Hyperparameters for Optimized, Explainable Machine Learning

Artificial intelligence and machine learning techniques have proved fertile methods for attacking difficult problems in medicine and public health. These techniques have garnered strong interest for the analysis of the large, multi-domain open science datasets that are increasingly available in health research. Discovery science in large datasets is challenging given the unconstrained nature of the learning environment where there may be a large number of potential predictors and appropriate ranges for model hyperparameters are unknown. As well, it is likely that explainability is at a premium in order to engage in future hypothesis generation or analysis. Here, we present a novel method that addresses these challenges by exploiting evolutionary algorithms to optimize machine learning discovery science while exploring a large solution space and minimizing bias. We demonstrate that our approach, called integrated evolutionary learning (IEL), provides an automated, adaptive method for jointly learning features and hyperparameters while furnishing explainable models where the original features used to make predictions may be obtained even with artificial neural networks. In IEL the machine learning algorithm of choice is nested inside an evolutionary algorithm which selects features and hyperparameters over generations on the basis of an information function to converge on an optimal solution. We apply IEL to three gold standard machine learning algorithms in challenging, heterogenous biobehavioral data: deep learning with artificial neural networks, decision tree-based techniques and baseline linear models. Using our novel IEL approach, artificial neural networks achieved ≥ 95% accuracy, sensitivity and specificity and 45–73% R2 in classification and substantial gains over default settings. IEL may be applied to a wide range of less- or unconstrained discovery science problems where the practitioner wishes to jointly learn features and hyperparameters in an adaptive, principled manner within the same algorithmic process. This approach offers significant flexibility, enlarges the solution space and mitigates bias that may arise from manual or semi-manual hyperparameter tuning and feature selection and presents the opportunity to select the inner machine learning algorithm based on the results of optimized learning for the problem at hand.

[1]  David J. Schneider,et al.  Multi-start Evolutionary Nonlinear OpTimizeR (MENOTR): A hybrid parameter optimization toolbox. , 2021, Biophysical chemistry.

[2]  Babak Nouri-Moghaddam,et al.  Feature selection for medical diagnosis: Evaluation for using a hybrid Stacked-Genetic approach in the diagnosis of heart disease , 2021, ArXiv.

[3]  Zhanqing Li,et al.  Efficient data preprocessing, episode classification, and source apportionment of particle number concentrations. , 2020, The Science of the total environment.

[4]  Jie Bai,et al.  A new hyperparameters optimization method for convolutional neural networks , 2019, Pattern Recognit. Lett..

[5]  Natan Vega Potler,et al.  An open resource for transdiagnostic research in pediatric mental health and learning disorders , 2017, Scientific Data.

[6]  Pradnya A. Vikhar,et al.  Evolutionary algorithms: A critical review and its future prospects , 2016, 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC).

[7]  Steven R. Young,et al.  Optimizing deep learning hyper-parameters through an evolutionary algorithm , 2015, MLHPC@SC.

[8]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[9]  Bart De Moor,et al.  Hyperparameter Search in Machine Learning , 2015, ArXiv.

[10]  A. Mackey,et al.  Resting-State fMRI , 2014, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[11]  Timothy O. Laumann,et al.  Methods to detect, characterize, and remove motion artifact in resting state fMRI , 2014, NeuroImage.

[12]  Ashok Ghatol,et al.  Feature selection for medical diagnosis : Evaluation for cardiovascular diseases , 2013, Expert Syst. Appl..

[13]  Sarah W. Feldstein Ewing,et al.  A quality control method for detecting and suppressing uncorrected residual motion in fMRI studies. , 2013, Magnetic resonance imaging.

[14]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[15]  V. Calhoun,et al.  Multisubject Independent Component Analysis of fMRI: A Decade of Intrinsic Networks, Default Mode, and Neurodiagnostic Discovery , 2012, IEEE Reviews in Biomedical Engineering.

[16]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[17]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[18]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[19]  Rex E. Jung,et al.  A Baseline for the Multivariate Comparison of Resting-State Networks , 2011, Front. Syst. Neurosci..

[20]  Kent A. Kiehl,et al.  A method for evaluating dynamic functional network connectivity and task-modulation: application to schizophrenia , 2010, Magnetic Resonance Materials in Physics, Biology and Medicine.

[21]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[22]  C. Kamath,et al.  An empirical comparison of combinations of evolutionary algorithms and neural networks for classification problems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  J. Pekar,et al.  A method for making group inferences from functional MRI data using independent component analysis , 2001, Human brain mapping.

[25]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[26]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[27]  E. Kleinberg An overtraining-resistant stochastic modeling method for pattern recognition , 1996 .

[28]  Gilbert Laporte,et al.  Metaheuristics: A bibliography , 1996, Ann. Oper. Res..

[29]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[30]  Erik B. Erhardt,et al.  Supplementary Material to: Tracking Whole-Brain Connectivity Dynamics in the Resting State. , 2013 .

[31]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[32]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[33]  García-Martínez Finding Optimal Neural Network Architecture Using Genetic Algorithms , 2007 .

[34]  L. Breiman Random Forests , 2001, Machine Learning.