Imputation of Missing Values in Economic and Financial Time Series Data Using Five Principal Component Analysis (PCA) Approaches

Chisimkwuo John, Emmanuel J. Ekpenyong and Charles C.Nworu This study assesses five approaches for imputing missing values. The evaluated methods include Singular Value Decomposition Imputation (svdPCA), Bayesian imputation (bPCA), Probabilistic imputation (pPCA), Non-Linear Iterative Partial Least squares imputation (nipalsPCA) and Local Least Square imputation (llsPCA). A 5%, 10%, 15% and 20% missing data were created under a missing completely at random (MCAR) assumption using five (5) variables: Net Foreign Assets (NFA), Credit to Core Private Sector (CCP), Reserve Money (RM), Narrow Money (M1), Private Sector Demand Deposits (PSDD), from 1981 to 2019 using R-software. The five imputation methods were used to estimate the artificially generated missing values. The performances of the PCA imputation approaches were evaluated based on the Mean Forecast Error (MFE), Root Mean Squared Error (RMSE) and Normalized Root Mean Squared Error (NRMSE) criteria. The result suggests that the bPCA, llsPCA and pPCA methods performed better than other imputation methods with the bPCA being the more appropriate method and llsPCA, the best method as it appears to be more stable than others in terms of the proportion of missingness.

[1]  Azlan Mohd Zain,et al.  A Review On Missing Value Estimation Using Imputation Algorithm , 2017 .

[2]  Gilbert Saporta,et al.  The NIPALS Algorithm for Missing Functional Data , 2010 .

[3]  C. Williams Applied Multivariate Data Analysis (2nd Edition) , 2002 .

[4]  Xiqun Chen,et al.  PCA-based missing information imputation for real-time crash likelihood prediction under imbalanced data , 2018, Transportmetrica A: Transport Science.

[5]  B. Everitt,et al.  Applied Multivariate Data Analysis: Everitt/Applied Multivariate Data Analysis , 2001 .

[6]  Henning Redestig,et al.  The pcaMethods Package , 2007 .

[7]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[8]  Vadlamani Ravi,et al.  Data imputation via evolutionary computation, clustering and a neural network , 2015, Neurocomputing.

[9]  Mickael Guedj,et al.  A Comparison of Six Methods for Missing Data Imputation , 2015 .

[10]  Wolfgang Härdle,et al.  Applied Multivariate Statistical Analysis: third edition , 2006 .

[11]  Joachim Selbig,et al.  pcaMethods - a bioconductor package providing PCA methods for incomplete data , 2007, Bioinform..

[12]  Taesung Park,et al.  Robust imputation method for missing values in microarray data , 2007, BMC Bioinformatics.

[13]  Julie Josse,et al.  Principal component analysis with missing values: a comparative survey of methods , 2015, Plant Ecology.

[14]  Hamid Reza Karimi,et al.  Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis and Iterative Local Least Squares , 2013 .

[15]  Fernando TUSELL Multiple imputation of time series with an application to the construction of historical price indices , .

[16]  Guy N. Brock,et al.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes , 2008, BMC Bioinformatics.

[17]  R. K. Agrawal,et al.  An Introductory Study on Time Series Modeling and Forecasting , 2013, ArXiv.

[18]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[19]  Sohae Oh,et al.  Multiple Imputation on Missing Values in Time Series Data , 2015 .

[20]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[21]  J. Robben,et al.  Treatment of missing values for multivariate statistical analysis of gel‐based proteomics data , 2008, Proteomics.

[22]  Shmuel Friedland,et al.  An Algorithm for Missing Value Estimation for DNA Microarray Data , 2005, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[23]  Eric R. Ziegel,et al.  Applied Multivariate Data Analysis , 2002, Technometrics.

[24]  Juha Karhunen,et al.  Robust PCA Methods for Complete and Missing Data , 2011 .

[25]  Ofer Harel,et al.  Addressing Missing Data Mechanism Uncertainty using Multiple-Model Multiple Imputation: Application to a Longitudinal Clinical Trial. , 2012, The annals of applied statistics.

[26]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[27]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[28]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .