Multi-source Heterogeneous Data Fusion for Toxin Level Quantification

Abstract The operational management of wastewater treatment plants (WWTP) is a complex activity due to the biological phenomena’ intricate nature. This complexity hinders the adoption of first principles approaches, which lack the necessary accuracy to be adopted in practice. Data-driven methodologies also face significant challenges in processing the different information sources available. In this work, we present a data-driven and model-agnostic data-fusion framework to estimate the concentration level of a toxin in the effluent, using heterogeneous data (sensor data, images, laboratory measurements) collected at different locations in the process. Single- and multi-source modeling approaches are applied and compared. Among the methodologies tested, Bayesian fusion stands out as presenting a good balance in terms of accuracy, stability, and flexibility.

[1]  Marco S. Reis,et al.  Network‐induced supervised learning: Network‐induced classification (NI‐C) and network‐induced regression (NI‐R) , 2013 .

[2]  Marco S. Reis,et al.  Applications of a new empirical modelling framework for balancing model interpretation and prediction accuracy through the incorporation of clusters of functionally related variables , 2013 .

[3]  Martin Guha,et al.  Encyclopedia of Statistics in Behavioral Science , 2006 .

[4]  Shona M. Morse,et al.  Assessing the value , 2006 .

[5]  Jukka Saarinen,et al.  Time Series Prediction with Multilayer Perception, FIR and Elman Neural Networks , 1996 .

[6]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[7]  Ana C Pereira,et al.  Advanced predictive methods for wine age prediction: Part II - A comparison study of multiblock regression approaches. , 2017, Talanta.

[8]  Mohammad. M. AlyanNezhadi,et al.  An efficient algorithm for multisensory data fusion under uncertainty condition , 2017 .

[9]  Federico Castanedo,et al.  A Review of Data Fusion Techniques , 2013, TheScientificWorldJournal.

[10]  Javier Del Ser,et al.  Data fusion and machine learning for industrial prognosis: Trends and perspectives towards Industry 4.0 , 2019, Inf. Fusion.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Leo H. Chiang,et al.  Monitoring Chemical Processes Using Judicious Fusion of Multi-Rate Sensor Data , 2019, Sensors.

[13]  Galit Shmueli,et al.  On information quality , 2012, SSRN Electronic Journal.

[14]  Marco S. Reis,et al.  Sensor fusion with irregular sampling and varying measurement delays , 2020 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Jingzheng Ren,et al.  Emergy Analysis and Sustainability Efficiency Analysis of Different Crop-Based Biodiesel in Life Cycle Perspective , 2013, TheScientificWorldJournal.

[17]  Othman Sidek,et al.  A review of data fusion models and systems , 2012 .

[18]  Ron S. Kenett,et al.  Assessing the value of information of data-centric activities in the chemical processing industry 4.0 , 2018, AIChE Journal.

[19]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Igor Mozetic,et al.  Evaluating time series forecasting models: an empirical study on performance estimation methods , 2019, Machine Learning.