论文信息 - Soft Sensing Transformer: Hundreds of Sensors are Worth a Single Word

Soft Sensing Transformer: Hundreds of Sensors are Worth a Single Word

With the rapid development of AI technology in recent years, there have been many studies with deep learning models in soft sensing area. However, the models have become more and more complex, yet, the data sets remain limited: researchers are fitting million-parameter models with hundreds of data samples, which is insufficient to exercise the effectiveness of their models. To solve this long-lasting problem, we are providing large-scale high-dimensional time series manufacturing sensor data from Seagate Technology to the public. We demonstrate the challenges and effectiveness of modeling industrial big data by a Soft Sensing Transformer model on these data sets. Transformer is used because, it has outperformed state-of-the-art techniques in Natural Language Processing, and since then has also performed well in the direct application to computer vision without introduction of image- specific inductive biases. We observe the similarity of a sentence structure to the sensor readings and process the multi-variable sensor readings in a similar manner of sentences in natural language. The high-dimensional time series data is formatted into the same shape of embedded sentences and fed into the transformer model. The results show that transformer model outperforms the benchmark models in soft sensing field based on auto-encoder and long short-term memory (LSTM). To the best of our knowledge, we are the first team in academia or industry to benchmark the performance of original transformer model with large-scale numerical soft sensing data. Additionally, In contrast to the natural language processing or computer vision tasks where human-level performances are regarded as golden standards, our large-scale soft sensing study is an example that transformer is able to interpret high-dimensional numerical data which is not interpretable by human.

[1] Yufei Tang,et al. Prognostics With Variational Autoencoder by Generative Adversarial Learning , 2022, IEEE Transactions on Industrial Electronics.

[2] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[4] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8] Rui Araújo,et al. A multilayer-perceptron based method for variable selection in soft sensor design , 2013 .

[9] D. Tao,et al. A Survey on Visual Transformer , 2020, ArXiv.

[10] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[11] Jianxun Liu,et al. Reliable machine prognostic health management in the presence of missing data , 2020, Concurr. Comput. Pract. Exp..

[12] Ming-Chun Huang,et al. The Smart Insole: A Pilot Study of Fall Detection , 2019, BODYNETS.

[13] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .

[14] Zhiqiang Geng,et al. Novel Transformer Based on Gated Convolutional Neural Network for Dynamic Soft Sensor Modeling of Industrial Processes , 2021, IEEE Transactions on Industrial Informatics.

[15] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[16] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17] Zhiqiang Ge,et al. A Survey on Deep Learning for Data-Driven Soft Sensors , 2021, IEEE Transactions on Industrial Informatics.

[18] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, ArXiv.

[20] Zhiqiang Ge,et al. Deep Learning for Industrial KPI Prediction: When Ensemble Learning Meets Semi-Supervised Data , 2021, IEEE Transactions on Industrial Informatics.

[21] Chao Zhang,et al. Auto-encoder based Model for High-dimensional Imbalanced Industrial Data , 2021, ICONIP.

[22] Pavel Pudil,et al. Introduction to Statistical Pattern Recognition , 2006 .

[23] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[24] Luigi Fortuna,et al. Soft Sensors for Monitoring and Control of Industrial Processes (Advances in Industrial Control) , 2006 .

[25] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[27] Cheng-Yuan Liou,et al. Autoencoder for words , 2014, Neurocomputing.

[28] Ming-Chun Huang,et al. Wearable Computing With Distributed Deep Learning Hierarchy: A Study of Fall Detection , 2020, IEEE Sensors Journal.

[29] Biao Huang,et al. Hierarchical Quality-Relevant Feature Representation for Soft Sensor Modeling: A Novel Deep Learning Strategy , 2020, IEEE Transactions on Industrial Informatics.

[30] Xipeng Qiu,et al. A Survey of Transformers , 2021, AI Open.