Robust Learning of Recurrent Neural Networks in Presence of Exogenous Noise

Recurrent Neural networks (RNN) have shown promising potential for learning dynamics of sequential data. However, artificial neural networks are known to exhibit poor robustness in presence of input noise, where the sequential architecture of RNNs exacerbates the problem. In this paper, we will use ideas from control and estimation theories to propose a tractable robustness analysis for RNN models that are subject to input noise. The variance of the output of the noisy system is adopted as a robustness measure to quantify the impact of noise on learning. It is shown that the robustness measure can be estimated efficiently using linearization techniques. Using these results, we proposed a learning method to enhance robustness of a RNN with respect to exogenous Gaussian noise with known statistics. Our extensive simulations on benchmark problems reveal that our proposed methodology significantly improves robustness of recurrent neural networks.

[1]  Liang Jin,et al.  Absolute stability conditions for discrete-time recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[2]  Nader Motee,et al.  Explicit Characterization of Performance of a Class of Networked Linear Control Systems , 2020, IEEE Transactions on Control of Network Systems.

[3]  Milad Siami,et al.  Centrality Measures in Linear Consensus Networks With Structured Network Uncertainties , 2018, IEEE Transactions on Control of Network Systems.

[4]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Moritz Hardt,et al.  Stable Recurrent Models , 2018, ICLR.

[7]  Mi-Ching Tsai,et al.  Robust and Optimal Control , 2014 .

[8]  J. Zico Kolter,et al.  Learning Stable Deep Dynamics Models , 2020, NeurIPS.

[9]  Thad Hughes,et al.  Recurrent neural networks for voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Nader Motee,et al.  Koopman Performance Analysis of Nonlinear Consensus Networks , 2018, Lecture Notes in Control and Information Sciences.

[11]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[12]  S. Sharma,et al.  The Fokker-Planck Equation , 2010 .

[13]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[14]  G. Pavliotis Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations , 2014 .

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Bassam Bamieh,et al.  Exact computation of traces and H2 norms for a class of infinite-dimensional problems , 2003, IEEE Trans. Autom. Control..

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Eric A. Wan,et al.  Neural dual extended Kalman filtering: applications in speech enhancement and monaural blind signal separation , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[19]  R. E. Kalman,et al.  Contributions to the Theory of Optimal Control , 1960 .

[20]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[21]  Ngai Wong,et al.  POPQORN: Quantifying Robustness of Recurrent Neural Networks , 2019, ICML.

[22]  Tengyu Ma,et al.  Gradient Descent Learns Linear Dynamical Systems , 2016, J. Mach. Learn. Res..

[23]  Hector Muñoz-Avila,et al.  Reinforcement Learning based Multi-Robot Classification via Scalable Communication Structure , 2020, ArXiv.

[24]  Milad Siami,et al.  Growing Linear Dynamical Networks Endowed by Spectral Systemic Performance Measures , 2018, IEEE Transactions on Automatic Control.

[25]  Inman Harvey,et al.  Seeing the Light: Artiicial Evolution, Real Vision Seeing the Light: Artiicial Evolution, Real Vision , 1994 .

[26]  Milad Siami,et al.  New spectral bounds on H2-norm of linear dynamical networks , 2017, Autom..

[27]  S. F. Schmidt APPLICATION OF STATISTICAL FILTER THEORY TO THE OPTIMAL ESTIMATION OF POSITION AND VELOCITY ON BOARD A CIRCUMLUNAR VEHICLE , 2022 .

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[30]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[31]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[32]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[33]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[34]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[35]  P. Khargonekar,et al.  State-space solutions to standard H2 and H∞ control problems , 1988, 1988 American Control Conference.

[36]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).