Missing Data Imputation Toolbox for MATLAB

Abstract Here we introduce a graphical user-friendly interface to deal with missing values called Missing Data Imputation (MDI) Toolbox. This MATLAB toolbox allows imputing missing values, following missing completely at random patterns, exploiting the relationships among variables. In this way, principal component analysis (PCA) models are fitted iteratively to impute the missing data until convergence. Different methods, using PCA internally, are included in the toolbox: trimmed scores regression (TSR), known data regression (KDR), KDR with principal component regression (KDR-PCR), KDR with partial least squares regression (KDR-PLS), projection to the model plane (PMP), iterative algorithm (IA), modified nonlinear iterative partial least squares regression algorithm (NIPALS) and data augmentation (DA). MDI Toolbox presents a general procedure to impute missing data, thus can be used to infer PCA models with missing data, to estimate the covariance structure of incomplete data matrices, or to impute the missing values as a preprocessing step of other methodologies.

[1]  Alberto Ferrer,et al.  Framework for regression‐based missing data imputation methods in on‐line MSPC , 2005 .

[2]  A. Ferrer,et al.  Dealing with missing data in MSPC: several methods, different interpretations, some examples , 2002 .

[3]  Alberto Ferrer,et al.  Multisynchro: a novel approach for batch synchronization in scenarios of multiple asynchronisms , 2014 .

[4]  Julio R. Banga,et al.  Enabling network inference methods to handle missing data and outliers , 2015, BMC Bioinformatics.

[5]  Lígia P. Brás,et al.  Dealing with gene expression missing data. , 2006, Systems biology.

[6]  José Camacho,et al.  Multivariate Exploratory Data Analysis (MEDA) Toolbox for Matlab , 2015 .

[7]  Silvia Lanteri,et al.  Classification of olive oils from their fatty acid composition , 1983 .

[8]  Alberto Ferrer,et al.  Building covariance matrices with the desired structure , 2013 .

[9]  Scott A Hutzler,et al.  Remote Near-Infrared Fuel Monitoring System , 1997 .

[10]  Alberto Ferrer,et al.  How to simulate normal data sets with the desired correlation structure , 2010 .

[11]  D. Massart,et al.  Dealing with missing data , 2001 .

[12]  José Camacho,et al.  On the use of the observation‐wise k‐fold operation in PCA cross‐validation , 2015 .

[13]  P. A. Taylor,et al.  Missing data methods in PCA and PLS: Score calculations with incomplete observations , 1996 .

[14]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[15]  José Camacho,et al.  Cross‐validation in PCA models with the element‐wise k‐fold (ekf) algorithm: theoretical aspects , 2012 .

[16]  A. Ferrer,et al.  PCA model building with missing data: New proposals and a comparative study , 2015 .

[17]  Manuel Zarzo,et al.  Modeling the variability of solar radiation data among weather stations by means of principal components analysis , 2011 .

[18]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[19]  Pedro García-Teodoro,et al.  Tampered Data Recovery in WSNs through Dynamic PCA and Variable Routing Strategies , 2013, J. Commun..

[20]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[21]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[22]  S. A. bano C. D. nn W. I. i Wold,et al.  Pattern recognition: finding and using regularities in multivariate data Food research, how to relate sets of measurements or observations to each other , 1983 .

[23]  Zsuzsanna Kovács,et al.  Characterisation of reversed-phase liquid chromatographic columns by chromatographic tests. Rational column classification by a minimal number of column test parameters. , 2003, Journal of chromatography. A.

[24]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[25]  Philip R. Nelson,et al.  The Treatment Of Missing Measurements In PCA And PLS Models , 2002 .

[26]  Vicenç Puig,et al.  Estimating Missing and False Data in Flow Meters of a Water Distribution Network , 2007 .