Modeling of missing data prediction: Computational intelligence and optimization algorithms

Four optimization algorithms (genetic algorithm, simulated annealing, particle swarm optimization and random forest) were applied with an MLP based auto associative neural network on two classification datasets and one prediction dataset. This work was undertaken to investigate the effectiveness of using auto associative neural networks and optimization algorithms in missing data prediction and classification tasks. If performed appropriately, computational intelligence and optimization algorithm systems could lead to consistent, accurate and trustworthy predictions and classifications resulting in more adequate decisions. The results reveal GA, SA and PSO to be more efficient when compared to RF in terms of predicting the forest area to be affected by fire. GA, SA, and PSO had the same accuracy of 93.3%, while RF showed 92.99% accuracy. For the classification problems, RF showed 93.66% and 92.11% accuracy on the German credit and Heart disease datasets respectively, outperforming GA, SA and PSO.

[1]  Dieter Armbruster,et al.  Analyzing the dynamics of cellular flames , 1996 .

[2]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[3]  T. Marwala,et al.  Fault classification in structures with incomplete measured data using autoassociative neural networks and genetic algorithm , 2006 .

[4]  Tshilidzi Marwala Probabilistic Fault Identification Using a Committee of Neural Networks and Vibration Data , 2001 .

[5]  Pong-Jeu Lu,et al.  Application of Autoassociative Neural Network on Gas-Path Sensor Data Validation , 2002 .

[6]  Robert E. Uhrig,et al.  Use of Autoassociative Neural Networks for Signal Validation , 1998, J. Intell. Robotic Syst..

[7]  Peter Rossmanith,et al.  Simulated Annealing , 2008, Taschenbuch der Algorithmen.

[8]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[9]  W. Marsden I and J , 2012 .

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[12]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[13]  Tshilidzi Marwala,et al.  Evaluating the Impact of Missing Data Imputation through the use of the Random Forest Algorithm , 2008 .

[14]  Christopher R. Houck,et al.  A Genetic Algorithm for Function Optimization: A Matlab Implementation , 2001 .

[15]  Daniel J. Inman,et al.  ON MODEL UPDATING USING NEURAL NETWORKS , 1998 .

[16]  Salman Mohagheghi,et al.  Particle Swarm Optimization: Basic Concepts, Variants and Applications in Power Systems , 2008, IEEE Transactions on Evolutionary Computation.

[17]  Tshilidzi Marwala,et al.  The use of genetic algorithms and neural networks to approximate missing data in database , 2005, IEEE 3rd International Conference on Computational Cybernetics, 2005. ICCC 2005..

[18]  Tshilidzi Marwala,et al.  Estimating Missing Data and Determining the Confidence of the Estimate Data , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[19]  Brian Birge,et al.  PSOt - a particle swarm optimization toolbox for use with Matlab , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[20]  Ming-Hua Chen,et al.  Pattern recognition of business failure by autoassociative neural networks in considering the missing values , 2010, 2010 International Computer Symposium (ICS2010).

[21]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[22]  B. L. Betechuoh,et al.  Autoencoder networks for HIV classification , 2006 .