Imputation of Missing Data Using PCA, Neuro-Fuzzy and Genetic Algorithms

This paper presents a method of imputing missing data that combines principal component analysis and neuro-fuzzy (PCA-NF) modeling in conjunction with genetic algorithms (GA). The ability of the model to impute missing data is tested using the South African HIV sero-prevalence dataset. The results indicate an average increase in accuracy from 60 % when using the neuro-fuzzy model independently to 99 % when the proposed model is used.

[1]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..

[2]  Tshilidzi Marwala,et al.  Estimation of Missing Data Using Computational Intelligence and Decision Trees , 2007 .

[3]  T. Marwala,et al.  Computational Intelligence for HIV Modelling , 2008, 2008 International Conference on Intelligent Engineering Systems.

[4]  Volker Tresp,et al.  Efficient Methods for Dealing with Missing Data in Supervised Learning , 1994, NIPS.

[5]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[6]  Hugues Bersini,et al.  Now comes the time to defuzzify neuro-fuzzy models , 1997, Fuzzy Sets Syst..

[7]  I. Jolliffe Principal Component Analysis , 2002 .

[8]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[9]  Mauro Birattari,et al.  The local paradigm for modeling and control: from neuro-fuzzy to lazy learning , 2001, Fuzzy Sets Syst..

[10]  Tshilidzi Marwala,et al.  Missing data: A comparison of neural network and expectation maximization techniques , 2007 .

[11]  Tshilidzi Marwala,et al.  Rough Set Theory for the Treatment of Incomplete Data , 2007, 2007 IEEE International Fuzzy Systems Conference.

[12]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[13]  Patrick E. McKnight Missing Data: A Gentle Introduction , 2007 .

[14]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[15]  E. Mizutani,et al.  Neuro-Fuzzy and Soft Computing-A Computational Approach to Learning and Machine Intelligence [Book Review] , 1997, IEEE Transactions on Automatic Control.