Big data quality prediction in the process industry: A distributed parallel modeling framework

Abstract With the ever increasing data collected from the process, the era of big data has arrived in the process industry. Therefore, the computational effort for data modeling and analytics in standalone modes has become increasingly demanding, particularly for large-scale processes. In this paper, a distributed parallel process modeling approach is presented based on a MapReduce framework for big data quality prediction. Firstly, the architecture for distributed parallel data modeling is formulated under the MapReduce framework. Secondly, a big data quality prediction scheme is developed based on the distributed parallel data modeling approach. As an example, the basic Semi-Supervised Probabilistic Principal Component Regression (SSPPCR) model is deployed to concurrently train a set of local models with split datasets. Meanwhile, Bayesian rule is utilized in a MapReduce way to integrate local models based on their predictive abilities. Two case studies demonstrate the effectiveness of the proposed method for big data quality prediction.

[1]  Plant-Wide Industrial Process Monitoring: A Distributed Modeling Framework , 2016, IEEE Transactions on Industrial Informatics.

[2]  S. Joe Qin,et al.  Process data analytics in the era of big data , 2014 .

[3]  Xiao Fan Wang,et al.  Soft sensing modeling based on support vector machine and Bayesian model selection , 2004, Comput. Chem. Eng..

[4]  Zhiqiang Ge,et al.  Dynamic Probabilistic Latent Variable Model for Process Data Modeling and Regression Application , 2019, IEEE Transactions on Control Systems Technology.

[5]  Zhiqiang Ge,et al.  Quantum statistic based semi-supervised learning approach for industrial soft sensor development , 2018 .

[6]  Lei Wu,et al.  Adaptive soft sensor modeling framework based on just-in-time learning and kernel partial least squares regression for nonlinear multiphase batch processes , 2014, Comput. Chem. Eng..

[7]  Zhiqiang Ge,et al.  Multimode Process Monitoring Based on Switching Autoregressive Dynamic Latent Variable Model , 2018, IEEE Transactions on Industrial Electronics.

[8]  Zhi-huan Song,et al.  Distributed PCA Model for Plant-Wide Process Monitoring , 2013 .

[9]  Zhiqiang Ge,et al.  Deep Learning of Semisupervised Process Data With Hierarchical Extreme Learning Machine and Soft Sensor Application , 2018, IEEE Transactions on Industrial Electronics.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Jim Austin,et al.  Hadoop neural network for parallel and distributed feature selection , 2016, Neural Networks.

[12]  Furong Gao,et al.  Review of Recent Research on Data-Based Process Monitoring , 2013 .

[13]  Zhiqiang Ge,et al.  Data Mining and Analytics in the Process Industry: The Role of Machine Learning , 2017, IEEE Access.

[14]  Hiromasa Kaneko,et al.  Development of a new soft sensor method using independent component analysis and partial least squares , 2009 .

[15]  Zhi-huan Song,et al.  Locally Weighted Kernel Principal Component Regression Model for Soft Sensing of Nonlinear Time-Variant Processes , 2014 .

[16]  Luigi Fortuna,et al.  Comparison of Soft-Sensor Design Methods for Industrial Plants Using Small Data Sets , 2009, IEEE Transactions on Instrumentation and Measurement.

[17]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[18]  Zhiqiang Ge,et al.  Large-scale plant-wide process modeling and hierarchical monitoring: A distributed Bayesian network approach , 2017 .

[19]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[20]  Zhiqiang Ge,et al.  Review on data-driven modeling and monitoring for plant-wide industrial processes , 2017 .

[21]  Zhiqiang Ge,et al.  Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes With Big Data , 2017, IEEE Transactions on Industrial Informatics.

[22]  Junjie Yan,et al.  Recursive weighted kernel regression for semi-supervised soft-sensing modeling of fed-batch processes , 2012 .

[23]  Biao Huang,et al.  Design of inferential sensors in the process industry: A review of Bayesian methods , 2013 .

[24]  Zhiqiang Ge,et al.  Mixture semisupervised principal component regression model and soft sensor application , 2014 .

[25]  Zhong Liu,et al.  Distributed Modeling in a MapReduce Framework for Data-Driven Traffic Flow Forecasting , 2013, IEEE Transactions on Intelligent Transportation Systems.

[26]  Zhiqiang Ge,et al.  Semi-supervised PLVR models for process monitoring with unequal sample sizes of process variables and quality variables , 2015 .

[27]  Zhiqiang Ge,et al.  Distributed predictive modeling framework for prediction and diagnosis of key performance index in plant-wide processes , 2017 .

[28]  Zhiqiang Ge,et al.  Locally Weighted Prediction Methods for Latent Factor Analysis With Supervised and Semisupervised Process Data , 2017, IEEE Transactions on Automation Science and Engineering.

[29]  Maozhen Li,et al.  A MapReduce-based distributed SVM algorithm for automatic image annotation , 2011, Comput. Math. Appl..

[30]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[31]  Di Tang,et al.  A Data-Driven Soft Sensor Modeling Method Based on Deep Learning and its Application , 2017, IEEE Transactions on Industrial Electronics.