Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models

BackgroundStandard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists.ResultsThe SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca.ConclusionThe results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models.

[1]  A. Seelig A general pattern for substrate recognition by P-glycoprotein. , 1998, European journal of biochemistry.

[2]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[3]  Rajarshi Guha,et al.  Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs , 2008, J. Chem. Inf. Model..

[4]  T. Kohonen Analysis of a simple self-organizing process , 1982, Biological Cybernetics.

[5]  Rajarshi Guha,et al.  Generation of QSAR sets with a self-organizing map. , 2004, Journal of molecular graphics & modelling.

[6]  G. Zlokarnik,et al.  In silico prediction of drug safety: despite progress there is abundant room for improvement. , 2004, Drug discovery today. Technologies.

[7]  Italo Poggesi,et al.  Predicting human pharmacokinetics from preclinical data. , 2004, Current opinion in drug discovery & development.

[8]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[9]  Micheline Piquette-Miller,et al.  Regulation of Drug-Metabolizing Enzymes and Transporters in Infection, Inflammation, and Cancer , 2008, Drug Metabolism and Disposition.

[10]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[11]  Yuichi Sugiyama,et al.  Impact of Drug Transporter Studies on Drug Discovery and Development , 2003, Pharmacological Reviews.

[12]  I M Kapetanovic,et al.  Computer-aided drug discovery and development (CADDD): in silico-chemico-biological approach. , 2008, Chemico-biological interactions.

[13]  Rajarshi Guha,et al.  Assessing How Well a Modeling Protocol Captures a Structure-Activity Landscape , 2008, J. Chem. Inf. Model..

[14]  B. M. Brown,et al.  Practical Non-Parametric Statistics. , 1981 .

[15]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[16]  Wei Zhang,et al.  Recent advances in computational prediction of drug absorption and permeability in drug discovery. , 2006, Current medicinal chemistry.

[17]  Gregory A Landrum,et al.  Building predictive ADMET models for early decisions in drug discovery. , 2004, Current opinion in drug discovery & development.