PLS model building with missing data: New algorithms and a comparative study

New algorithms to deal with missing values in predictive modelling are presented in this article. Specifically, 2 trimmed scores regression adaptations are proposed, one from principal component analysis model building with missing data (MD) and other from partial least squares regression model exploitation with missing values. Using these methods, practitioners can impute MD both in the explanatory/predictor and the dependent/response variables. Partial least squares is used here to build the multivariate calibration models; however, any regression method can be used after MD imputation. Four case studies, with different latent structures, are analysed here to compare the trimmed scores regression–based methods against state‐of‐the‐art approaches. The MATLAB code for these methods is also provided for its direct implementation at http://mseg.webs.upv.es, under a GNU license.

[1]  Abel Folch-Fortuny,et al.  Assessment of maximum likelihood PCA missing data imputation , 2016 .

[2]  P. A. Taylor,et al.  Missing data methods in PCA and PLS: Score calculations with incomplete observations , 1996 .

[3]  Lorenz T. Biegler,et al.  An optimization‐based undeflated PLS (OUPLS) method to handle missing data in the training set , 2014 .

[4]  Abel Folch-Fortuny,et al.  Missing Data Imputation Toolbox for MATLAB , 2016 .

[5]  F. Arteaga 3.06 – Missing Data , 2009 .

[6]  Alberto Ferrer,et al.  Framework for regression‐based missing data imputation methods in on‐line MSPC , 2005 .

[7]  W J Krzanowski,et al.  Missing value imputation in multivariate data using the singular value decomposition of a matrix , 1988 .

[8]  R. Manne,et al.  Missing values in principal component analysis , 1998 .

[9]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[10]  A. Ferrer,et al.  Dealing with missing data in MSPC: several methods, different interpretations, some examples , 2002 .

[11]  A. Ferrer,et al.  PCA model building with missing data: New proposals and a comparative study , 2015 .

[12]  Gene H. Golub,et al.  Regularization by Truncated Total Least Squares , 1997, SIAM J. Sci. Comput..

[13]  Jesús Picó,et al.  Bilinear modelling of batch processes. Part II: a comparison of PLS soft‐sensors , 2008 .

[14]  Alberto Ferrer,et al.  Calibration transfer between NIR spectrometers: New proposals and a comparative study , 2017 .

[15]  Alberto Ferrer,et al.  Building covariance matrices with the desired structure , 2013 .

[16]  Hugo Kubinyi,et al.  Evolutionary variable selection in regression and PLS analyses , 1996 .

[17]  Steven D. Brown,et al.  Comparison of five iterative imputation methods for multivariate classification , 2013 .

[18]  Anders Hald,et al.  Statistical Theory with Engineering Applications , 1952 .

[19]  Alberto Ferrer,et al.  How to simulate normal data sets with the desired correlation structure , 2010 .

[20]  José Camacho,et al.  On the use of the observation‐wise k‐fold operation in PCA cross‐validation , 2015 .

[21]  S. A. bano C. D. nn W. I. i Wold,et al.  Pattern recognition: finding and using regularities in multivariate data Food research, how to relate sets of measurements or observations to each other , 1983 .

[22]  Abel Folch-Fortuny,et al.  Metabolic Flux Understanding of Pichia pastoris Grown on Heterogenous , 2014 .

[23]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[24]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[25]  Hans-Heinrich Hübbe Gesund im Job , 2016 .

[26]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[27]  Rodrigo López‐Negrete de la Fuente,et al.  An efficient nonlinear programming strategy for PCA models with incomplete data sets , 2010 .

[28]  Kristin L. Sainani,et al.  Dealing with missing data , 2002 .

[29]  John L.P. Thompson,et al.  Missing data , 2004, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.