Bagging neural network sensitivity analysis for feature reduction for in-silico drug design

This paper illustrates a new approach to sensitivity analysis for feature selection using multiple ensemble neural networks in a bootstrapping mode with bagging. This methodology is applied to in-silico drug design with QSAR (quantitative structural activity relationship), which is notoriously challenging for machine learning because typically there are on the order of 300-1000 dependent features, often for as few as 50-100 data points. For an HIV dataset with 160 wavelets descriptors, the number of relevant features was reduced to 35, and the resulting predictive neural network model gave better results than with the full feature set.