Estimating Missing Data and Determining the Confidence of the Estimate Data

A Computational Intelligence approach to estimate missing data makes use of Autoassociative Neural Networks (ANN) and a stochastic optimization technique. The ANN captures interrelationships within data and the optimization technique estimates probable values that are used as inputs to the ANN. The optimum estimate is one that has a minimum influence on the output of the ANN. A method to determine the confidence of this estimate is presented in this paper. An ensemble of ANNs with a Multi Layer Perceptron architecture is collected using Bayesian training methods. The percentage of the most dominant estimate values is used as a confidence measure. The South African antenatal seroprevalence survey data is used and the HIV status of the patients is estimated. It was found that the missing data could be estimated with an overall accuracy of 68% and the confidence ranges between 50% and 97%. Estimates that have a confidence exceeding 70% have 88% estimation accuracy.

[1]  A Kartashov,et al.  Quality and efficiency of retrieval for Willshaw-like autoassociative networks. II. Recognition , 1995 .

[2]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[3]  D. Rubin,et al.  MULTIPLE IMPUTATIONS IN SAMPLE SURVEYS-A PHENOMENOLOGICAL BAYESIAN APPROACH TO NONRESPONSE , 2002 .

[4]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[5]  Tshilidzi Marwala,et al.  Prediction of HIV Status from Demographic Data Using Neural Networks , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[6]  P. Allison Multiple Imputation for Missing Data , 2000 .

[7]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..

[8]  Robert E. Uhrig,et al.  Use of Autoassociative Neural Networks for Signal Validation , 1998, J. Intell. Robotic Syst..

[9]  B. L. Betechuoh,et al.  Autoencoder networks for HIV classification , 2006 .

[10]  Nejib Smaoui,et al.  Analyzing the Dynamics of Cellular Flames Using Karhunen-Loève Decomposition and Autoassociative Neural Networks , 2002, SIAM J. Sci. Comput..

[11]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[12]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[13]  Giray Ökten,et al.  Random sampling from low-discrepancy sequences: applications to option pricing , 2002 .

[14]  Tshilidzi Marwala,et al.  Using Principal Component Analysis and Autoassociative Neural Networks to Estimate Missing Data in a Database , 2022 .