Multisource Latent Feature Selective Ensemble Modeling Approach for Small-Sample High-Dimensional Process Data in Applications

Several difficult-to-measure production qualities or environment pollution indices of industrial process must be measured using offline laboratory instruments. Soft measurement method is often used to perform online prediction of such parameters. Only small-sample modeling data with high-dimensional input features can be obtained due to the limitations and complex characteristics of the measurement device and process, respectively. Therefore, a new multisource latent feature selective ensemble (SEN) modeling approach is proposed in this study. First, input features are divided into different subgroups according to the characteristics of the modeling data. Second, the extracted multisource latent features evolve from the multi-layered selection algorithms, which are specified by feature reduction ratio, feature contribution ratio and mutual information value orderly for each subgroup. Finally, in order to construct candidate sub-models, an adaptive hyper-parameter selection algorithm based on the multi-step grid search is employed in terms of the reduced features. Sequentially, the optimized ensemble submodels with their weighting strategies are adaptively determined to build the final SEN model. The proposed method is verified by using benchmark near-infrared data, high dimensional mechanical frequency spectrum data and industrial dioxin emission concentration data.

[1]  Yoshihiko Hamamoto,et al.  Improvement of the Parzen classifier in small training sample size situations , 2001, Intell. Data Anal..

[2]  Chai Tian,et al.  Operational Optimization and Feedback Control for Complex Industrial Processes , 2013 .

[3]  Amaury Lendasse,et al.  Comparison of combining methods using Extreme Learning Machines under small sample scenario , 2016, Neurocomputing.

[4]  Junfei Qiao,et al.  Mechanism characteristic analysis and soft measuring method review for ball mill load based on mechanical vibration and acoustic signals in the grinding process , 2018, Minerals Engineering.

[5]  Tianyou Chai,et al.  A Novel Evolutionary Algorithm for Dynamic Constrained Multiobjective Optimization Problems , 2020, IEEE Transactions on Evolutionary Computation.

[6]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Xianpeng Wang,et al.  Naphtha Pyrolysis Process Modeling Based on Ensemble Learning with LSSVM , 2018 .

[8]  Wen Yu,et al.  Selective ensemble modeling load parameters of ball mill based on multi-scale frequency spectral features and sphere criterion , 2016 .

[9]  Yao-San Lin,et al.  Small sample regression: Modeling with insufficient data , 2010, The 40th International Conference on Computers & Indutrial Engineering.

[10]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[11]  Minghui Huang,et al.  Robust Least-Squares Support Vector Machine With Minimization of Mean and Variance of Modeling Error , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Ligang Wu,et al.  Subspace ensemble learning via totally-corrective boosting for gait recognition , 2017, Neurocomputing.

[13]  Yao-San Lin,et al.  Modeling with Insufficient Data to Increase Prediction Stability , 2016, 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).

[14]  Tianyou Chai,et al.  A Comparative Study That Measures Ball Mill Load Parameters Through Different Single-Scale and Multiscale Frequency Spectra-Based Approaches , 2016, IEEE Transactions on Industrial Informatics.

[15]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[16]  Gurjit Singh Walia,et al.  Crowd anomaly detection using Aggregation of Ensembles of fine-tuned ConvNets , 2020, Neurocomputing.

[17]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[18]  Javier Pérez-Rodríguez,et al.  Regularized ensemble neural networks models in the Extreme Learning Machine framework , 2019, Neurocomputing.

[19]  Chang Liu,et al.  Least squares support vector machine with self-organizing multiple kernel learning and sparsity , 2019, Neurocomputing.

[20]  R. Dhanalakshmi,et al.  Stability of feature selection algorithm: A review , 2019, J. King Saud Univ. Comput. Inf. Sci..

[21]  Blaise Hanczar,et al.  Analysis of feature selection stability on high dimension and small sample data , 2014, Comput. Stat. Data Anal..

[22]  Der-Chiang Li,et al.  Extending Attribute Information for Small Data Set Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[23]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[24]  Zhiqiang Ge,et al.  Semisupervised Robust Modeling of Multimode Industrial Processes for Quality Variable Prediction Based on Student's t Mixture Model , 2020, IEEE Transactions on Industrial Informatics.

[25]  Tianyou Chai,et al.  Combinatorial optimization of input features and learning parameters for decorrelated neural network ensemble-based soft measuring model , 2018, Neurocomputing.

[26]  Weisi Lin,et al.  End-to-End Ensemble Learning by Exploiting the Correlation Between Individuals and Weights , 2021, IEEE Transactions on Cybernetics.

[27]  X. Luo,et al.  Multiobjective Production Planning Optimization Using Hybrid Evolutionary Algorithms for Mineral Processing , 2011, IEEE Transactions on Evolutionary Computation.

[28]  Virpi Junttila,et al.  Bayesian principal component regression model with spatial effects for forest inventory variables under small field sample size. , 2016, 1605.07439.

[29]  Tianyou Chai,et al.  Multitasking Multiobjective Evolutionary Operational Indices Optimization of Beneficiation Processes , 2019, IEEE Transactions on Automation Science and Engineering.

[30]  Shen Yin,et al.  Tuning kernel parameters for SVM based on expected square distance ratio , 2016, Inf. Sci..

[31]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A hybrid meta-learning architecture for multi-objective optimization of SVM parameters , 2014, Neurocomputing.

[32]  Licheng Jiao,et al.  Maximizing diversity by transformed ensemble learning , 2019, Appl. Soft Comput..

[33]  Jinliang Ding,et al.  Constrained Operational Optimization of a Distillation Unit in Refineries With Varying Feedstock Properties , 2020, IEEE Transactions on Control Systems Technology.

[34]  Jian Tang,et al.  Vibration and acoustic frequency spectra for industrial process modeling using selective fusion multi-condition samples and multi-source features , 2018 .

[35]  Heeyoung Kim,et al.  Fault Classification in High-Dimensional Complex Processes Using Semi-Supervised Deep Convolutional Generative Models , 2020, IEEE Transactions on Industrial Informatics.

[36]  Jun-Hai Zhai,et al.  Ensemble dropout extreme learning machine via fuzzy integral for data classification , 2018, Neurocomputing.

[37]  Tianyou Chai,et al.  Kernel latent features adaptive extraction and selection method for multi-component non-stationary signal of industrial mechanical device , 2016, Neurocomputing.