Feature selection for in-silico drug design using genetic algorithms and neural networks

QSAR (quantitative structure activity relationship) is a discipline within computational chemistry that deals with predictive modeling, often for relatively small datasets where the number of features might exceed the number of data points, leading to extreme dimensionality problems. The paper addresses a novel feature selection procedure for QSAR based on genetic algorithms to reduce the curse of dimensionality problem. In this case the genetic algorithm minimizes a cost function derived from the correlation matrix between the features and the activity of interest that is being modeled. From a QSAR dataset with 160 features, the genetic algorithm selected a feature subset (40 features), which built a better predictive model than with full feature set. The results for feature reduction with genetic algorithm were also compared with neural network sensitivity analysis.