Island Transpeciation: A Co-Evolutionary Neural Architecture Search, Applied to Country-Scale Air-Quality Forecasting

Air pollution causes around 400 000 premature deaths per year in Europe due to Particulate Matter, nitrogen oxides, and ground-level ozone pollutants. Multiple-input multiple-output nonlinear auto-regressive exogenous deep neural networks are frequently used to predict a day before, air-quality pollution incidents, at a country scale. With complexity and data sizes increasing, finding performant models becomes harder. We propose island transpeciation to optimize hyperparameters and architectures. Unlike using a single optimizer, island transpeciation combines results from multiple optimizers, to consistently provide excellent performance. Moreover, we show that island transpeciation outperforms random model search and other previous modeling efforts. Island transpeciation is a neural architecture search that uses co-evolution (genes), to combine (transpeciation) populations of incompatible optimizers (species) organized in island formations. In island transpeciation, architecture search is parallelized and utilizes a distributed pool of hardware resources. We have successfully used these techniques to predict next-day ozone concentrations across the Belgian territory.

[1]  Maoguo Gong,et al.  A Survey on Evolutionary Construction of Deep Neural Networks , 2021, IEEE Transactions on Evolutionary Computation.

[2]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[3]  Rui Liu,et al.  Effective long short-term memory with differential evolution algorithm for electricity price prediction , 2018, Energy.

[4]  Kalyanmoy Deb,et al.  NSGA-Net: neural architecture search using multi-objective genetic algorithm , 2018, GECCO.

[5]  Graham W. Taylor,et al.  Forecasting air quality time series using deep learning , 2018, Journal of the Air & Waste Management Association.

[6]  Rob J. Hyndman,et al.  A note on the validity of cross-validation for evaluating autoregressive time series prediction , 2018, Comput. Stat. Data Anal..

[7]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[8]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[9]  Mengjie Zhang,et al.  A Particle Swarm Optimization-Based Flexible Convolutional Autoencoder for Image Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[11]  Pradnya A. Vikhar,et al.  Evolutionary algorithms: A critical review and its future prospects , 2016, 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC).

[12]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[13]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[14]  B. Moor,et al.  Improving the PM10 estimates of the air quality model AURORA by using Optimal Interpolation , 2015 .

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Johan A. K. Suykens,et al.  Incremental kernel spectral clustering for online learning of non-stationary data , 2014, Neurocomputing.

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[20]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[21]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[22]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[23]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[24]  Bart De Moor,et al.  Assimilation of ozone measurements in the air quality model AURORA by using the Ensemble Kalman Filter , 2011, IEEE Conference on Decision and Control and European Control Conference.

[25]  Gregory W. Corder,et al.  Comparing Variables of Ordinal or Dichotomous Scales: Spearman Rank‐Order, Point‐Biserial, and Biserial Correlations , 2011 .

[26]  John C. Duchi,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011 .

[27]  J. Thepaut,et al.  The ERA‐Interim reanalysis: configuration and performance of the data assimilation system , 2011 .

[28]  Hang Xie,et al.  Time series prediction based on NARX neural networks: An advanced approach , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[29]  C. Wilke,et al.  The look-ahead effect of phenotypic mutations , 2007, Biology Direct.

[30]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[31]  J. Hooyberghs,et al.  A neural network forecast for daily average PM10 concentrations in Belgium , 2005 .

[32]  S. Cunnane [Survival of the fattest: the key to human brain evolution]. , 2005, Medecine sciences : M/S.

[33]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[34]  D. Irwin,et al.  The role of phenotypic plasticity in driving genetic evolution , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[35]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[36]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[37]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[38]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[40]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[41]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[42]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[43]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[44]  S. Nash Newton-Type Minimization via the Lanczos Method , 1984 .

[45]  L. Hill,et al.  The physiological influence of ozone , 1911 .

[46]  O. F. Cook,et al.  FACTORS OF SPECIES-FORMATION. , 1906, Science.

[47]  Berna Bakir Batu,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[48]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[49]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[50]  C. Guerreiro Air quality in Europe : 2013 report , 2013 .

[51]  Dario Izzo,et al.  The Generalized Island Model , 2012, Parallel Architectures and Bioinspired Algorithms.

[52]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[53]  X. Yao Evolving Artificial Neural Networks , 1999 .

[54]  Thomas Bäck,et al.  Extended Selection Mechanisms in Genetic Algorithms , 1991, ICGA.

[55]  Darrell Whitley,et al.  Genitor: a different genetic algorithm , 1988 .

[56]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.

[57]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .