Efficient Mask Attention-Based NARMAX (MAB-NARMAX) Model Identification

Model structure selection is crucial in system identification and data-driven modelling. Many spurious candidate variables can influence the determination of model structures due to the lack of prior knowledge of the system of interest. The commonly used method is to test as many possible models as possible and select a set of best models. This study proposes a novel mask attention-based NARMRAX (MAB-NARMAX) modelling method for nonlinear dynamic system identification. The mask attention mechanism comes from the widely used neural network Transformer to reduce the dependency of the features and neurons. The performance of the proposed method is tested on three simulation datasets. Results show that the proposed MAB-NARMAX modelling framework has convincing multistep-ahead prediction performance for nonlinear system identification in that it can produce the precious model structure. Even when the data are polluted with high noise (resulting in low SNR), the proposed method can still generate reliable system models compared with the state-of-the-art machine learning methods, e.g., LASSO and LSTM.

[1]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Peter Vajda,et al.  Rethinking the Self-Attention in Vision Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Cordelia Schmid,et al.  Segmenter: Transformer for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Xuanjing Huang,et al.  Mask Attention Networks: Rethinking and Strengthen Transformer , 2021, NAACL.

[5]  Christian Hansen,et al.  Multi-Head Self-Attention with Role-Guided Masks , 2020, ECIR.

[6]  Kuldip K. Paliwal,et al.  Masked multi-head self-attention for causal speech enhancement , 2020, Speech Commun..

[7]  Stephen Billings,et al.  NARMAX Model as a Sparse, Interpretable and Transparent Machine Learning Approach for Big Medical and Healthcare Data Analysis , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[8]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[9]  Luis Antonio Aguirre,et al.  NARMAX model identification using a randomised approach , 2019, Int. J. Model. Identif. Control..

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Luigi Piroddi,et al.  A randomized algorithm for nonlinear model structure selection , 2015, Autom..

[12]  S. Billings Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains , 2013 .

[13]  Guo-Xing Wen,et al.  Adaptive fuzzy-neural tracking control for uncertain nonlinear discrete-time systems in the NARMAX form , 2011 .

[14]  Stephen A. Billings,et al.  Model structure selection using an integrated forward orthogonal search algorithm assisted by squared correlation and mutual information , 2008, Int. J. Model. Identif. Control..

[15]  Stephen A. Billings,et al.  An adaptive orthogonal search algorithm for model subset selection and non-linear system identification , 2008, Int. J. Control.

[16]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[17]  Stephen A. Billings,et al.  A new class of wavelet networks for nonlinear system identification , 2005, IEEE Transactions on Neural Networks.

[18]  Meng Joo Er,et al.  NARMAX time series model prediction: feedforward and recurrent fuzzy neural network approaches , 2005, Fuzzy Sets Syst..

[19]  S. A. Billings,et al.  The wavelet-NARMAX representation: A hybrid model structure combining polynomial models with multiresolution wavelet decompositions , 2005, Int. J. Syst. Sci..

[20]  S. Billings,et al.  Prediction of the Dst index using multiresolution wavelet models , 2004 .

[21]  Steve A. Billings,et al.  Term and variable selection for non-linear system identification , 2004 .

[22]  Stephen A. Billings,et al.  An alternative solution to the model structure selection problem , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[23]  L. A. Aguirre,et al.  Improved structure selection for nonlinear models based on term clustering , 1995 .

[24]  S. Billings,et al.  Identification of Polynomial & Rational Narmax Models , 1994 .

[25]  Sheng Chen,et al.  Representations of non-linear systems: the NARMAX model , 1989 .

[26]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.