An Aggregated Convolutional Transformer Based on Slices and Channels for Multivariate Time Series Classification

Convolutional neural network has achieved remarkable success, and has excellent local feature extraction ability. Similarly, Transformer has been developed markedly in recent years, achieving excellent representation capabilities in terms of global features, which has aroused heated discussions. In terms of multivariate time series classification, most previous networks had convolution and long and short-term memory structures. This paper innovatively proposes a combination of Transformer-encoder and convolutional structures, which we refer to as the Multivariate time series classification Convolutional Transformer Network (MCTNet). The different advantages of convolutional neural network and self-attention are used to capture potential deep information in multivariate time series more accurately. The Transformer is considered to be data-hungry, and combined with the induction bias of the convolutional neural network to solve this problem, early features are extracted through the convolutional layers, and the both squeeze and excitation convolution encoder (BC-Encoder) structure is proposed. Attentional prototype learning is also used to mitigate the limited label problem. Moreover, a new network design that focuses on slices and channels is proposed, moving beyond the concept that the use of Transformer will require many parameters. Experimental results from 26 datasets of the well-known multivariate time series archive UEA show that the performance of our model is better than that of most state-of-the-art models.

[1]  Xu Cheng,et al.  Temporal Attention Convolutional Neural Network for Estimation of Icing Probability on Wind Turbine Blades , 2021, IEEE Transactions on Industrial Electronics.

[2]  Jianliang Xu,et al.  ShapeNet: A Shapelet-Neural Network Approach for Multivariate Time Series Classification , 2021, AAAI.

[3]  Weijian Li,et al.  ConTNet: Why not use convolution and transformer at the same time? , 2021, ArXiv.

[4]  Siyuan Ma,et al.  Gated Transformer Networks for Multivariate Time Series Classification , 2021, ArXiv.

[5]  Ari S. Morcos,et al.  ConViT: improving vision transformers with soft convolutional inductive biases , 2021, ICML.

[6]  Geoffrey I. Webb,et al.  MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Series Classification , 2020, KDD.

[7]  Hui Xiong,et al.  Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting , 2020, AAAI.

[8]  Hao Zhang,et al.  SE-ECGNet: A Multi-scale Deep Residual Network with Squeeze-and-Excitation Module for ECG Signal Classification , 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[10]  Anuradha Bhamidipaty,et al.  A Transformer-based Framework for Multivariate Time Series Representation Learning , 2020, KDD.

[11]  Dhruv Madeka,et al.  MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention , 2020, ArXiv.

[12]  Krzysztof Janowicz,et al.  Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting , 2020, Trans. GIS.

[13]  Chang-Tien Lu,et al.  TapNet: Multivariate Time Series Classification with Attentional Prototypical Network , 2020, AAAI.

[14]  Houshang Darabi,et al.  Adversarial Attacks on Multivariate Time Series , 2020, ArXiv.

[15]  Nicolas Loeff,et al.  Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting , 2019, International Journal of Forecasting.

[16]  Geoffrey I. Webb,et al.  ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels , 2019, Data Mining and Knowledge Discovery.

[17]  Geoffrey I. Webb,et al.  InceptionTime: Finding AlexNet for time series classification , 2019, Data Mining and Knowledge Discovery.

[18]  Wenhu Chen,et al.  Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.

[19]  Shih-Fu Chang,et al.  CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation , 2019, ArXiv.

[20]  Houxiang Zhang,et al.  Modeling and Analysis of Motion Data from Dynamically Positioned Vessels for Sea State Estimation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[21]  Michael Flynn,et al.  The UEA multivariate time series classification archive, 2018 , 2018, ArXiv.

[22]  David M. J. Tax,et al.  Multivariate Time-Series Classification Using the Hidden-Unit Logistic Model , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[24]  Houshang Darabi,et al.  Multivariate LSTM-FCNs for Time Series Classification , 2018, Neural Networks.

[25]  Ulf Leser,et al.  Multivariate Time Series Classification with WEASEL+MUSE , 2017, ArXiv.

[26]  Houshang Darabi,et al.  LSTM Fully Convolutional Networks for Time Series Classification , 2017, IEEE Access.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Tim Oates,et al.  Time series classification from scratch with deep neural networks: A strong baseline , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[29]  Lars Schmidt-Thieme,et al.  Ultra-Fast Shapelets for Time Series Classification , 2015, ArXiv.

[30]  George C. Runger,et al.  Learning a symbolic representation for multivariate time series classification , 2015, Data Mining and Knowledge Discovery.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Carlo Vercellis,et al.  Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification , 2010, Pattern Recognit..

[33]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[34]  Houxiang Zhang,et al.  A Novel Channel and Temporal-Wise Attention in Convolutional Networks for Multivariate Time Series Classification , 2020, IEEE Access.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Hailin Li,et al.  Accurate and efficient classification based on common principal components analysis for multivariate time series , 2016, Neurocomputing.

[37]  Jun Wang,et al.  On the Non-Trivial Generalization of Dynamic Time Warping to the Multi-Dimensional Case , 2015, SDM.

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .