From decision tree to heterogeneous decision forest : A novel chemometrics approach for structure-activity relationship modeling

The techniques of combining the predictions of multiple classification models to produce a single model have been investigated for many years. In earlier applications, the multiple models to be combined have been developed by altering the training set. The use of these so-called resampling techniques, however, enhance the risk of reducing predictivity of the models to be combined and/or over fitting the noise in the data, which might result in poorer prediction of the composite model than the individual models. In this paper, we suggest a novel approach, named Heterogenious Decision Forest (HDF), that combines multiple Decision Tree models. Each Decision Tree model is developed using a unique set of descriptors. When models of similar predictive quality are combined using the HDF method, quality compared to the individual models is consistently and significantly improved in both training and testing steps. An example will be presented for prediction of binding affinity of 232 chemicals to the estrogen receptor.