Data Set Modelability by QSAR

We introduce a simple MODelability Index (MODI) that estimates the feasibility of obtaining predictive QSAR models (correct classification rate above 0.7) for a binary data set of bioactive compounds. MODI is defined as an activity class-weighted ratio of the number of nearest-neighbor pairs of compounds with the same activity class versus the total number of pairs. The MODI values were calculated for more than 100 data sets, and the threshold of 0.65 was found to separate the nonmodelable and modelable data sets.

[1]  Alexander Tropsha,et al.  Using Graph Indices for the Analysis and Comparison of Chemical Datasets , 2013, Molecular informatics.

[2]  A. Tropsha,et al.  Human Intestinal Transporter Database: QSAR Modeling and Virtual Profiling of Drug Uptake, Efflux and Interactions , 2013, Pharmaceutical Research.

[3]  Michael B. Black,et al.  A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. , 2012, Toxicological sciences : an official journal of the Society of Toxicology.

[4]  Matthias Rarey,et al.  From Activity Cliffs to Target‐Specific Scoring Models and Pharmacophore Hypotheses , 2011, ChemMedChem.

[5]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[6]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[7]  Windy A. Boyd,et al.  A high-throughput method for assessing chemical toxicity using a Caenorhabditis elegans reproduction assay. , 2010, Toxicology and applied pharmacology.

[8]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[9]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[10]  I. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[11]  Rajarshi Guha,et al.  Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs , 2008, J. Chem. Inf. Model..

[12]  Victor Kuzmin,et al.  Hierarchical QSAR technology based on the Simplex representation of molecular structure , 2008, J. Comput. Aided Mol. Des..

[13]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[14]  Alexander Golbraikh,et al.  Combinatorial QSAR Modeling of P-Glycoprotein Substrates , 2006, J. Chem. Inf. Model..

[15]  Osman F. Güner,et al.  An integrated approach to three-dimensional information management with MACCS-3D , 1991, J. Chem. Inf. Comput. Sci..