Multiblock-Networks: A Neural Network Analog to Component Based Methods for Multi-Source Data

Training predictive models on datasets from multiple sources is a common, yet challenging setup in applied machine learning. Even though model interpretation has attracted more attention in recent years, many modeling approaches still focus mainly on performance. To further improve the interpretability of machine learning models, we suggest the adoption of concepts and tools from the well-established framework of component based multiblock analysis, also known as chemometrics. Nevertheless, artificial neural networks provide greater flexibility in model architecture and thus, often deliver superior predictive performance. In this study, we propose a setup to transfer the concepts of component based statistical models, including multiblock variants of principal component regression and partial least squares regression, to neural network architectures. Thereby, we combine the flexibility of neural networks with the concepts for interpreting block relevance in multiblock methods. In two use cases we demonstrate how the concept can be implemented in practice, and compare it to both common feed-forward neural networks without blocks, as well as statistical component based multiblock methods. Our results underline that multiblock networks allow for basic model interpretation while matching the performance of ordinary feed-forward neural networks.

[1]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[2]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[3]  U. Indahl,et al.  RENT - Repeated Elastic Net Technique for Feature Selection , 2020, IEEE Access.

[4]  Dário Passos,et al.  Deep multiblock predictive modelling using parallel input convolutional neural networks. , 2021, Analytica chimica acta.

[5]  Anne M. Denton,et al.  Multiple Sources Classification of Gene Position on Chromosomes Using Statistical Significance of Individual Classification Results , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[6]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[7]  K. Walsh,et al.  Achieving robustness across season, location and cultivar for a NIRS model for intact mango fruit dry matter content , 2020 .

[8]  H. Abdi,et al.  Multiple factor analysis: principal component analysis for multitable and multiblock data sets , 2013 .

[9]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[10]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[11]  Jukka Heikkonen,et al.  Object Detection Based on Multi-sensor Proposal Fusion in Maritime Environment , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[12]  Tormod Næs,et al.  ROSA—a fast extension of partial least squares regression for multiblock data analysis , 2016 .

[13]  T. Næs,et al.  Path modelling by sequential PLS regression , 2011 .

[14]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[15]  Elena Tsiporkova,et al.  Annotating the Performance of Industrial Assets via Relevancy Estimation of Event Logs , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).