Long short-term relevance learning

To incorporate prior knowledge as well as measurement uncertainties in the traditional long short term memory (LSTM) neural networks, an efficient sparse Bayesian training algorithm is introduced to the network architecture. The proposed scheme automatically determines relevant neural connections and adapts accordingly, in contrast to the classical LSTM solution. Due to its flexibility, the new LSTM scheme is less prone to overfitting, and hence can approximate time dependent solutions by use of a smaller data set. On a structural nonlinear finite element application we show that the self-regulating framework does not require prior knowledge of a suitable network architecture and size, while ensuring satisfying accuracy at reasonable computational cost. keywords: LSTM, Neural network, Automatic relevance determination, Bayesian, Sparsity, Finite element model

[1]  Tom Heskes,et al.  Bayesian Source Localization with the Multivariate Laplace Prior , 2009, NIPS.

[2]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[3]  F. Liu,et al.  Time Series Regression Based on Relevance Vector Learning Mechanism , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[6]  Joaquin Quiñonero Candela,et al.  Time series prediction based on the Relevance Vector Machine with adaptive kernels , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Duy Nguyen-Tuong,et al.  Probabilistic Recurrent State-Space Models , 2018, ICML.

[8]  Nikolay I. Nikolaev,et al.  Recursive Bayesian Recurrent Neural Networks for Time-Series Modeling , 2010, IEEE Transactions on Neural Networks.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Oriol Vinyals,et al.  Bayesian Recurrent Neural Networks , 2017, ArXiv.

[11]  N. Nikolaev,et al.  Sequential relevance vector machine learning from time series , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[12]  Zhao Liu,et al.  Nonlinear unsteady bridge aerodynamics: Reduced-order modeling based on deep LSTM networks , 2020 .

[13]  Bojana Rosic Stochastic state estimation via incremental iterative sparse polynomial chaos based Bayesian-Gauss-Newton-Markov-Kalman filter , 2019 .

[14]  V. T. Meinders,et al.  Determination of strain hardening parameters of tailor hardened boron steel up to high strains using inverse FEM optimization and strain field matching , 2016 .

[15]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[16]  Hong Peng,et al.  Time series estimation based on deep Learning for structural dynamic nonlinear prediction , 2021 .

[17]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[18]  Jong-Hwan Kim,et al.  Learning to reproduce stochastic time series using stochastic LSTM , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[19]  David P. Wipf,et al.  From Bayesian Sparsity to Gated Recurrent Nets , 2017, NIPS.

[20]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[21]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[22]  H. Vincent Poor,et al.  Fast Variational Sparse Bayesian Learning With Automatic Relevance Determination for Superimposed Signals , 2011, IEEE Transactions on Signal Processing.

[23]  N. Cutland,et al.  On homogeneous chaos , 1991, Mathematical Proceedings of the Cambridge Philosophical Society.

[24]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[25]  Hermann G. Matthies,et al.  Direct Bayesian update of polynomial chaos representations , 2011 .

[26]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[27]  Xiao Lin,et al.  An Approximate Bayesian Long Short- Term Memory Algorithm for Outlier Detection , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[28]  Lars Greve,et al.  Neural network-based surrogate model for a bifurcating structural fracture response , 2021, Engineering Fracture Mechanics.

[29]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[30]  P. Whittle Hypothesis testing in time series analysis , 1954 .

[31]  Haijian Zhang,et al.  Distributed compressive sensing via LSTM-Aided sparse Bayesian learning , 2020, Signal Process..

[32]  Sotirios Chatzis,et al.  Sparse Bayesian Recurrent Neural Networks , 2015, ECML/PKDD.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[35]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[36]  Yang Liu,et al.  Physics-Informed Multi-LSTM Networks for Metamodeling of Nonlinear Structures , 2020, Computer Methods in Applied Mechanics and Engineering.

[37]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[38]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[39]  Ata Kabán,et al.  On Bayesian classification with Laplace priors , 2007, Pattern Recognit. Lett..