Missing data: A comparison of neural network and expectation maximization techniques

Two techniques have emerged from the recent litera-ture as candidate solutions to the problem of missing data imputation. These are the expectation maximiza-tion (EM) algorithm and the auto-associative neural network and genetic algorithm (GA) combination. Both these techniques have been discussed individually and their merits discussed at length in the available literature. However, they have not been compared with each other. This article provides a comparison of the two techniques using datasets of an industrial power plant, an industrial winding process and HIV sero-prevalence survey data. Results show that the EM al-gorithm is more suitable and performs better in cases where there is little or no interdependency between the input variables, whereas the auto-associative neural network and GA combination is suitable when there are inherent nonlinear relationships between some of the given variables. Keywords:

[1]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[2]  R.J. Marks,et al.  Implicit learning in autoencoder novelty assessment , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[3]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[4]  Nathalie Japkowicz,et al.  Supervised Learning with Unsupervised Output Separation , 2002 .

[5]  Volker Tresp,et al.  Efficient Methods for Dealing with Missing Data in Supervised Learning , 1994, NIPS.

[6]  T. Bastogne,et al.  Application of subspace methods to the identification of a winding process , 1997, 1997 European Control Conference (ECC).

[7]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[8]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[13]  S. M. Dhlamini,et al.  Condition Monitoring of HV Bushings in the Presence of Missing Data Using Evolutionary Computing , 2007, ArXiv.

[14]  S. F. Buck A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer , 1960 .

[15]  A Kartashov,et al.  Quality and efficiency of retrieval for Willshaw-like autoassociative networks. II. Recognition , 1995 .

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..