The use of genetic algorithms and neural networks to approximate missing data in database

Missing data creates various problems in analysing and processing data in databases. In this paper we introduce a new method aimed at approximating missing data in a database using a combination of genetic algorithms and neural networks. The proposed method uses genetic algorithm to minimise an error function derived from an auto-associative neural network. Multi-layer perceptron (MLP) and radial basis function (RBF) networks are employed to train the neural networks. Our focus also lies on the investigation of using the proposed method in accurately predicting missing data as the number of missing cases within a single record increases. It is observed that there is no significant reduction in accuracy of results as the number of missing cases in a single record increases. It is also found that results obtained using RBF are superior to MLP.

[1]  Victoria Y. Yoon,et al.  Artificial neural networks: an emerging new technique , 1990, DATB.

[2]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[3]  Verzekeren Naar Sparen,et al.  Cambridge , 1969, Humphrey Burton: In My Own Time.

[4]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  Manuel Alfonseca Genetic algorithms , 1991, APL '91.

[7]  M. Mišík,et al.  Oxford University Press , 1968, PMLA/Publications of the Modern Language Association of America.

[8]  P. Roth MISSING DATA: A CONCEPTUAL REVIEW FOR APPLIED PSYCHOLOGISTS , 1994 .

[9]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[10]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[11]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[12]  D. Rubin,et al.  MULTIPLE IMPUTATIONS IN SAMPLE SURVEYS-A PHENOMENOLOGICAL BAYESIAN APPROACH TO NONRESPONSE , 2002 .

[13]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[14]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[15]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[16]  Parag C. Pendharkar,et al.  An empirical study of impact of crossover operators on the performance of non-binary genetic algorithm based neural approaches for classification , 2004, Comput. Oper. Res..

[17]  Thomas Kolarik,et al.  Time series forecasting using neural networks , 1994, APL '94.

[18]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[19]  Prabuddha De,et al.  Proceedings of the 20th international conference on Information Systems , 1999 .

[20]  A. Meyers Reading , 1999, Language Teaching.

[21]  Parag C. Pendharkar,et al.  An empirical study of non-binary genetic algorithm-based neural approaches for classification , 1999, ICIS.

[22]  Michael Jones,et al.  The use of genetic algorithms and neural networks to investigate the Baldwin effect , 1999, SAC '99.

[23]  P. Allison Multiple Imputation for Missing Data , 2000 .

[24]  John T. Cunningham,et al.  New Jersey , 1896, The Journal of Comparative Medicine and Veterinary Archives.

[25]  Judi Scheffer,et al.  Dealing with Missing Data , 2020, The Big R‐Book.

[26]  Mingxiu Hu,et al.  EVALUATION OF SOME POPULAR IMPUTATION ALGORITHMS , 2002 .

[27]  Yang C. Yuan,et al.  Multiple Imputation for Missing Data: Concepts and New Development , 2000 .