Monitoring big process data of industrial plants with multiple operating modes based on Hadoop

Abstract For modeling and monitoring large-scale plant-wide processes with big data from multiple operating conditions, a novel distributed parallel Gaussian mixture model is proposed based on the Hadoop MapReduce framework. To deal with high-dimensional process variables, a multiblock method is adopted. For big data chunks in each divided block, an analytical procedure is carried out with three key procedures. First, the fundamental data statistics are obtained with the designed distributed and parallel manners for data standardization. Second, conventional Gaussian mixture model learning steps are accommodated in the parallel paradigm of the MapReduce platform. Finally, multilevel fault detection and diagnosis schemes are developed to conduct hierarchical monitoring from plant-wide, unit block, and variable levels. The feasibility and effectiveness of the proposed method are demonstrated on two study cases.

[1]  Donghua Zhou,et al.  Batch Process Modeling and Monitoring With Local Outlier Factor , 2019, IEEE Transactions on Control Systems Technology.

[2]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[3]  Zhiqiang Ge,et al.  Large-scale plant-wide process modeling and hierarchical monitoring: A distributed Bayesian network approach , 2017 .

[4]  Zhiqiang Ge,et al.  Robust modeling of mixture probabilistic principal component analysis and process monitoring application , 2014 .

[5]  S. Joe Qin,et al.  Process data analytics in the era of big data , 2014 .

[6]  Maozhen Li,et al.  A MapReduce-based distributed SVM algorithm for automatic image annotation , 2011, Comput. Math. Appl..

[7]  Aditya B. Patel,et al.  Addressing big data problem using Hadoop and Map Reduce , 2012, 2012 Nirma University International Conference on Engineering (NUiCONE).

[8]  Xuefeng Yan,et al.  Monitoring multi-mode plant-wide processes by using mutual information-based multi-block PCA, joint probability, and Bayesian inference , 2014 .

[9]  Nina F. Thornhill,et al.  Plant-wide root cause identification using plant key performance indicators (KPIs) with application to a paper machine , 2016 .

[10]  Ping Zhang,et al.  A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process , 2012 .

[11]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[12]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[13]  Dimitris Mourtzis,et al.  An Internet of Things-Based Monitoring System for Shop-Floor Control , 2018, J. Comput. Inf. Sci. Eng..

[14]  Plant-Wide Industrial Process Monitoring: A Distributed Modeling Framework , 2016, IEEE Transactions on Industrial Informatics.

[15]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[16]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[17]  Michael J. Piovoso,et al.  On unifying multiblock analysis with application to decentralized process monitoring , 2001 .

[18]  Rajagopalan Srinivasan,et al.  Implementation of multi agents based system for process supervision in large-scale chemical plants , 2014, Comput. Chem. Eng..

[19]  Zhiqiang Ge,et al.  Recursive Mixture Factor Analyzer for Monitoring Multimode Time-Variant Industrial Processes , 2016 .

[20]  Bokyoung Kang,et al.  Integrating independent component analysis and local outlier factor for plant-wide process monitoring , 2011 .

[21]  Mohieddine Jelali,et al.  Revision of the Tennessee Eastman Process Model , 2015 .

[22]  S. Joe Qin,et al.  Root cause diagnosis of plant-wide oscillations using Granger causality , 2014 .

[23]  Zhiqiang Ge,et al.  Variational Bayesian Gaussian Mixture Regression for Soft Sensing Key Variables in Non-Gaussian Industrial Processes , 2017, IEEE Transactions on Control Systems Technology.

[24]  Zhiqiang Ge,et al.  Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes With Big Data , 2017, IEEE Transactions on Industrial Informatics.

[25]  Hong Zhou,et al.  Decentralized Fault Diagnosis of Large-Scale Processes Using Multiblock Kernel Partial Least Squares , 2010, IEEE Transactions on Industrial Informatics.

[26]  Nishchal K. Verma,et al.  Comparative analysis of Gaussian mixture model, logistic regression and random forest for big data classification using map reduce , 2016, 2016 11th International Conference on Industrial and Information Systems (ICIIS).

[27]  Svante Wold,et al.  Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection , 1996 .

[28]  Dimitris Mourtzis,et al.  Industrial Big Data as a Result of IoT Adoption in Manufacturing , 2016 .

[29]  Zhiqiang Ge,et al.  Review on data-driven modeling and monitoring for plant-wide industrial processes , 2017 .

[30]  Okyay Kaynak,et al.  Big Data for Modern Industry: Challenges and Trends [Point of View] , 2015, Proc. IEEE.

[31]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[32]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[33]  Nina F. Thornhill,et al.  Advances and new directions in plant-wide disturbance detection and diagnosis , 2007 .

[34]  Zhiqiang Ge,et al.  Data Mining and Analytics in the Process Industry: The Role of Machine Learning , 2017, IEEE Access.

[35]  S. Qin,et al.  Multimode process monitoring with Bayesian inference‐based finite Gaussian mixture models , 2008 .