An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters

In this paper, we propose an algorithmic framework to automatically generate efficient deep neural networks and optimize their associated hyperparameters. The framework is based on evolving directed acyclic graphs (DAGs), defining a more flexible search space than the existing ones in the literature. It allows mixtures of different classical operations: convolutions, recurrences and dense layers, but also more newfangled operations such as self-attention. Based on this search space we propose neighbourhood and evolution search operators to optimize both the architecture and hyper-parameters of our networks. These search operators can be used with any metaheuristic capable of handling mixed search spaces. We tested our algorithmic framework with an evolutionary algorithm on a time series prediction benchmark. The results demonstrate that our framework was able to find models outperforming the established baseline on numerous datasets.

[1]  Yaochu Jin,et al.  Survey on Evolutionary Deep Learning: Principles, Algorithms, Applications, and Open Issues , 2022, ACM Comput. Surv..

[2]  Ahmad Alsahref,et al.  Review of ML and AutoML Solutions to Forecast Time-Series Data , 2022, Archives of Computational Methods in Engineering.

[3]  F. Hutter,et al.  Efficient Automated Deep Learning for Time Series Forecasting , 2022, ECML/PKDD.

[4]  El-Ghazali Talbi,et al.  Automated Design of Deep Neural Networks , 2021, ACM Comput. Surv..

[5]  Zongjiang Shang,et al.  Scale-Aware Neural Architecture Search for Multivariate Time Series Forecasting , 2021, ArXiv.

[6]  Michael J. Dinneen,et al.  Nondeterminism and Instability in Neural Network Optimization , 2021, ICML.

[7]  Wesley M. Gifford,et al.  AutoAI-TS: AutoAI for Time Series Forecasting , 2021, SIGMOD Conference.

[8]  Enrique Alba,et al.  Bayesian Neural Architecture Search using A Training-Free Performance Metric , 2020, Applied Soft Computing.

[9]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.

[10]  Colin White,et al.  BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search , 2019, AAAI.

[11]  Cheng Xu,et al.  Forecast Methods for Time Series Data: A Survey , 2021, IEEE Access.

[12]  Cheng-Lin Liu,et al.  DNA computing inspired deep networks design , 2020, Neurocomputing.

[13]  Masoud Daneshtalab,et al.  DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems , 2020, Microprocess. Microsystems.

[14]  Martin Jaggi,et al.  On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.

[15]  Wei Wang,et al.  Understanding Architectures Learnt by Cell-based Neural Architecture Search , 2019, ICLR.

[16]  Nicolas Chapados,et al.  N-BEATS: Neural basis expansion analysis for interpretable time series forecasting , 2019, ICLR.

[17]  Valentin Flunkert,et al.  DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.

[18]  Siem Morten Johannes Dahl TSPO: An AutoML Approach to Time Series Forecasting , 2020 .

[19]  Roman Garnett,et al.  D-VAE: A Variational Autoencoder for Directed Acyclic Graphs , 2019, NeurIPS.

[20]  Bin Wang,et al.  Evolving deep neural networks by multi-objective particle swarm optimization for image classification , 2019, GECCO.

[21]  Quoc V. Le,et al.  The Evolved Transformer , 2019, ICML.

[22]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[24]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[25]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[26]  Mengjie Zhang,et al.  A Particle Swarm Optimization-Based Flexible Convolutional Autoencoder for Image Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Lars Kotthoff,et al.  Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.

[28]  Quoc V. Le,et al.  Understanding and Simplifying One-Shot Architecture Search , 2018, ICML.

[29]  Bin Wang,et al.  Evolving Deep Convolutional Neural Networks by Variable-Length Particle Swarm Optimization for Image Classification , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[30]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[31]  Nuno Lourenço,et al.  DENSER: deep evolutionary network structured representation , 2018, Genetic Programming and Evolvable Machines.

[32]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[33]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Mengjie Zhang,et al.  A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification , 2017, arXiv.org.

[37]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[38]  Jean-Yves Ramel,et al.  An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems , 2015, ICPRAM.

[39]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[40]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[41]  Marcelo P. Fiore,et al.  The Algebra of Directed Acyclic Graphs , 2013, Computation, Logic, Games, and Quantum Foundations.

[42]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[43]  Carlos A. Coello Coello,et al.  Evolutionary Synthesis of Logic Circuits Using Information Theory , 2003, Artificial Intelligence Review.

[44]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.