Selection of Predictor Variables for Pneumonia Using Neural Networks and Genetic Algorithms

BACKGROUND Artificial neural networks (ANN) can be used to select sets of predictor variable that incorporate nonlinear interactions between variables. We used a genetic algorithm, with selection based on maximizing network accuracy and minimizing network input-layer cardinality, to evolve parsimonious sets of variables for predicting community-acquired pneumonia among patients with respiratory complaints. METHODS ANN were trained on data from 1044 patients in a training cohort, and were applied to 116 patients in a testing cohort. Chromosomes with binary genes representing input-layer variables were operated on by crossover recombination, mutation, and probabilistic selection based on a fitness function incorporating both network accuracy and input-layer cardinality. RESULTS The genetic algorithm evolved best 10-variable sets that discriminated pneumonia in the training cohort (ROC areas, 0.838 for selection based on average cross entropy (ENT); 0.954 for selection based on ROC area (ROC)), and in the testing cohort (ROC areas, 0.847 for ENT selection; 0.963 for ROC selection), with no significant differences between cohorts. Best variable sets based on the genetic algorithm using ROC selection discriminated pneumonia more accurately than variable sets based on stepwise neural networks (ROC areas, 0.954 versus 0.879, p = 0.030), or stepwise logistic regression (ROC areas, 0.954 versus 0.830, p = 0.000). Variable sets of lower cardinalities were also evolved, which also accurately discriminated pneumonia. CONCLUSION Variable sets derived using a genetic algorithm for neural networks accurately discriminated pneumonia from other respiratory conditions, and did so with greater accuracy than variables derived using stepwise neural networks or logistic regression in some cases.