Missing data imputation in multivariate data by evolutionary algorithms

This paper presents a proposal based on an evolutionary algorithm to impute missing observations in multivariate data. A genetic algorithm based on the minimization of an error function derived from their covariance matrix and vector of means is presented. All methodological aspects of the genetic structure are presented. An extended explanation of the design of the fitness function is provided. An application example is solved by the proposed method.

[1]  S. Nielsen The stochastic EM algorithm: estimation and asymptotic results , 2000 .

[2]  G. Box,et al.  A general distribution theory for a class of likelihood criteria. , 1949, Biometrika.

[3]  N. H. Timm Applied Multivariate Analysis , 2002 .

[4]  M. Arnold Reasoning about non‐linear AR models using expectation maximization , 2003 .

[5]  E. C. Oreja,et al.  Análisis multivariante de datos , 1995 .

[6]  Hui-Chuan Chen,et al.  Estimating missing data of wind speeds using neural network , 2002, Proceedings IEEE SoutheastCon 2002 (Cat. No.02CH37283).

[7]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[10]  Dusko Kalenatic,et al.  An Evolutionary Approach for Imputing Missing Data in Time Series , 2010, J. Circuits Syst. Comput..

[11]  P. Krishnaiah,et al.  16 Likelihood ratio tests for mean vectors and covariance matrices , 1980 .

[12]  N.H.W. Eklund,et al.  Using genetic algorithms to estimate confidence intervals for missing spatial data , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Jianfeng Yao,et al.  A multiple-imputation Metropolis version of the EM algorithm , 2003 .

[14]  George Casella,et al.  Implementations of the Monte Carlo EM Algorithm , 2001 .

[15]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[16]  M. Rueda,et al.  An improved estimator to analyse missing data , 2008 .

[17]  Juan Carlos Figueroa García,et al.  Missing Data Imputation in Time Series by Evolutionary Algorithms , 2008, ICIC.

[18]  Phil D. Green,et al.  Speech enhancement with missing data techniques using recurrent neural networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  L. Devroye Non-Uniform Random Variate Generation , 1986 .