MBA-GUI: A chemometric graphical user interface for multi-block data visualisation, regression, classification, variable selection and automated pre-processing

In recent years, due to advances in sensor technology, multi-modal measurement of process and products properties has become easier. However, multi-modal measurements are only of use if the data from adding new sensors is worthwhile, especially in the case of industrial applications where financial justification is needed for new sensor purchase and integration, and if the multi-modal data generated can be properly utilised. Several multi-block methods have been developed to do this; however, their use is largely limited to chemometricians, and non-experts have little experience with such methods. To deal with this, we present the first version of a MATLAB-based graphical user interface (GUI) for multi-block data analysis (MBA), capable of performing data visualisation, regression, classification and variable selection for up to 4 different sensors. The MBA-GUI can also be used to implement a recent technique called sequential pre-processing through orthogonalization (SPORT). Data sets are supplied to demonstrate how to use the MBA-GUI. In summary, the developed GUI makes the implementation of multi-block data analysis easier, so that it could be used also by practitioners with no programming skills or unfamiliar with the MATLAB environment. The fully functional GUI can be downloaded from (https://github.com/puneetmishra2/Multi-block.git) and can be either installed to run in the MATLAB environment or as a standalone executable program. The GUI can also be used for analysis of a single block of data (standard chemometrics).

[1]  T. Næs,et al.  Path modelling by sequential PLS regression , 2011 .

[2]  Jean-Michel Roger,et al.  Sequential preprocessing through ORThogonalization (SPORT) and its application to near infrared spectroscopy , 2020 .

[3]  Hongping Shu,et al.  Spectra data classification with kernel extreme learning machine , 2019, Chemometrics and Intelligent Laboratory Systems.

[4]  Howard Mark,et al.  Chemometrics in Spectroscopy , 2007 .

[5]  Desire L. Massart,et al.  The robust normal variate transform for pattern recognition with near-infrared data , 1999 .

[6]  Paul Geladi,et al.  Multiblock variable influence on orthogonal projections (MB-VIOP) for enhanced interpretation of total, global, local and unique variations in OnPLS models , 2020, BMC Bioinformatics.

[7]  E. K. Kemsley,et al.  FTIR spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils. , 2003, Journal of agricultural and food chemistry.

[8]  Julien Boccard,et al.  A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion. , 2013, Analytica chimica acta.

[9]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[10]  Paul Geladi,et al.  Chemometrics in spectroscopy. Part 1. Classical chemometrics , 2003 .

[11]  Tormod Næs,et al.  The Sequential and Orthogonalized PLS Regression for Multiblock Regression , 2019, Data Handling in Science and Technology.

[12]  Rasmus Bro,et al.  Common and distinct components in data fusion , 2016, 1607.02328.

[13]  Witold Pedrycz,et al.  A survey on machine learning for data fusion , 2020, Inf. Fusion.

[14]  Beata Walczak,et al.  VSN: Variable sorting for normalization , 2020 .

[15]  Tormod Næs,et al.  Combining SO-PLS and linear discriminant analysis for multi-block classification , 2015 .

[16]  Jean-Michel Roger,et al.  SO‐CovSel: A novel method for variable selection in a multiblock framework , 2020 .

[17]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[18]  T. Næs,et al.  The Effect of Multiplicative Scatter Correction (MSC) and Linearity Improvement in NIR Spectroscopy , 1988 .

[19]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[20]  R. Barnes,et al.  Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra , 1989 .

[21]  Eva Ceulemans,et al.  How to perform multiblock component analysis in practice , 2011, Behavior Research Methods.

[22]  Chen Li,et al.  Optimal preprocessing of serum and urine metabolomic data fusion for staging prostate cancer through design of experiment. , 2017, Analytica chimica acta.

[23]  Andrés R. Martínez Bilesio,et al.  Fusing data of different orders for environmental monitoring. , 2019, Analytica chimica acta.

[24]  Age K. Smilde,et al.  Common and distinct variation in data fusion of designed experimental data , 2019, Metabolomics.

[25]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[26]  Douglas N. Rutledge,et al.  ComDim Methods for the Analysis of Multiblock Data in a Data Fusion Perspective , 2019, Data Handling in Science and Technology.

[27]  Tormod Næs,et al.  Multi-block regression based on combinations of orthogonalisation, PLS-regression and canonical correlation analysis , 2013 .

[28]  Xudong Sun,et al.  NIRS prediction of dry matter content of single olive fruit with consideration of variable sorting for normalisation pre-treatment , 2020 .

[29]  Age K. Smilde,et al.  Separating common (global and local) and distinct variation in multiple mixed types data sets , 2019, Journal of Chemometrics.

[30]  Véronique Cariou,et al.  Analysis of multiblock datasets using ComDim: Overview and extension to the analysis of (K + 1) datasets , 2016 .

[31]  Quansheng Chen,et al.  Instrumental intelligent test of food sensory quality as mimic of human panel test combining multiple cross-perception sensors and data fusion. , 2014, Analytica chimica acta.

[32]  Age K. Smilde,et al.  Performance of methods that separate common and distinct variation in multiple data blocks , 2018, Journal of Chemometrics.

[33]  Andrea D. Magrì,et al.  Data-fusion for multiplatform characterization of an Italian craft beer aimed at its authentication. , 2014, Analytica chimica acta.

[34]  Frans van den Berg,et al.  Review of the most common pre-processing techniques for near-infrared spectra , 2009 .

[35]  Richard D. Braatz,et al.  Assessment of Recent Process Analytical Technology (PAT) Trends: A Multiauthor Review , 2015 .

[36]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[37]  Ricard Boqué,et al.  Data fusion methodologies for food and beverage authentication and quality assessment - a review. , 2015, Analytica chimica acta.

[38]  J. Roger,et al.  CovSel: Variable selection for highly multivariate and multi-response calibration: Application to IR spectroscopy , 2011 .

[39]  El Mostafa Qannari,et al.  Defining the underlying sensory dimensions , 2000 .

[40]  A. Smilde,et al.  On the increase of predictive performance with high-level data fusion. , 2011, Analytica Chimica Acta.

[41]  Rasmus Bro,et al.  Understanding data fusion within the framework of coupled matrix and tensor factorizations , 2013 .

[42]  Rasmus Bro,et al.  Extension of SO-PLS to multi-way arrays: SO-N-PLS , 2017 .

[43]  D. Rutledge,et al.  Iterative weighting of multiblock data in the orthogonal partial least squares framework. , 2014, Analytica chimica acta.

[44]  Rasmus Bro,et al.  Structure-revealing data fusion , 2014, BMC Bioinformatics.

[45]  Age K. Smilde,et al.  Principal Component Analysis , 2003, Encyclopedia of Machine Learning.