Mining flexible‐receptor molecular docking data

Knowledge discovery in databases has become an integral part of practically every aspect of bioinformatics research, which usually produces, and has to process, very large amounts of data. Rational drug design is one of the current scientific areas that has greatly benefited from bioinformatics, particularly a step, which analyzes receptor–ligand interactions via molecular docking simulations. An important challenge is the inclusion of the receptor flexibility since they can become computationally very demanding. We have represented this explicit flexibility as a series of different conformations derived from a molecular dynamics simulation trajectory of the receptor. This model has been termed as the fully flexible receptor (FFR) model. In our studies, the receptor is the enzyme InhA from Mycobacterium tuberculosis, which is the major drug target for the treatment of tuberculosis. The FFR model of InhA (named FFR_InhA) was docked to four ligands, namely, nicotinamide adenine dinucleotide, pentacyano(isoniazid)ferrate II, triclosan, and ethionamide, thus, generating very large amounts of data, which needs to be mined to produce useful knowledge to help accelerate drug discovery and development. Very little work has been done in this area. In this article, we review our work on the application of classification decision trees, regression model tree, and association rules using properly preprocessed data of the FFR molecular docking results, and show how they can provide an improved understanding of the FFR_InhA‐ligand behavior. Furthermore, we explain how data mining techniques can support the acceleration of molecular docking simulations of FFR models. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 532–541 DOI: 10.1002/widm.46

[1]  X. Zou,et al.  Ensemble docking of multiple protein structures: Considering protein structural variations in molecular docking , 2006, Proteins.

[2]  Chung F Wong,et al.  Flexible ligand-flexible protein docking in protein kinase systems. , 2008, Biochimica et biophysica acta.

[3]  James C. Sacchettini,et al.  Mechanism of thioamide drug action against tuberculosis and leprosy , 2007, The Journal of experimental medicine.

[4]  Wagner Meira,et al.  Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins , 2009, Proteins.

[5]  O. N. de Souza,et al.  Molecular dynamics simulation studies of the wild-type, I21V, and I16T mutants of isoniazid-resistant Mycobacterium tuberculosis enoyl reductase (InhA) in complex with NADH: toward the understanding of NADH-InhA different affinities. , 2005, Biophysical journal.

[6]  H. Berendsen,et al.  COMPUTER-SIMULATION OF MOLECULAR-DYNAMICS - METHODOLOGY, APPLICATIONS, AND PERSPECTIVES IN CHEMISTRY , 1990 .

[7]  Ana T. Winck,et al.  Mining flexible-receptor docking experiments to select promising protein receptor snapshots , 2010, BMC Genomics.

[8]  Osmar Norberto de Souza,et al.  FReDD: Supporting Mining Strategies through a Flexible-Receptor Docking Database , 2009, BSB.

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[11]  David Alland,et al.  Targeting Tuberculosis and Malaria through Inhibition of Enoyl Reductase , 2003, Journal of Biological Chemistry.

[12]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Osmar Norberto de Souza,et al.  Automating Molecular Docking with Explicit Receptor Flexibility Using Scientific Workflows , 2007, BSB.

[15]  Somesh D. Sharma,et al.  Managing protein flexibility in docking and its applications. , 2009, Drug discovery today.

[16]  Alex Alves Freitas,et al.  On the Importance of Comprehensible Classification Models for Protein Function Prediction , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Jung-Hsin Lin,et al.  The relaxed complex method: Accommodating receptor flexibility for drug design with an improved scoring scheme. , 2003, Biopolymers.

[18]  Dennis Shasha,et al.  Introduction to Data Mining in Bioinformatics , 2005, Data Mining in Bioinformatics.

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  I. Kuntz Structure-Based Strategies for Drug Design and Discovery , 1992, Science.

[21]  T. Lybrand Ligand-protein docking and rational drug design. , 1995, Current Opinion in Structural Biology.

[22]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..

[23]  J. Gready,et al.  Combining docking and molecular dynamic simulations in drug design , 2006, Medicinal research reviews.

[24]  J. Sacchettini,et al.  Crystal structure and function of the isoniazid target of Mycobacterium tuberculosis , 1995, Science.

[25]  Christopher Adams,et al.  Spending on New Drug Development , 2008 .

[26]  Osmar Norberto de Souza,et al.  Extracting Information from Flexible Receptor-Flexible Ligand Docking Experiments , 2008, BSB.

[27]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[28]  L. A. Basso,et al.  An inorganic iron complex that inhibits wild-type and an isoniazid-resistant mutant 2-trans-enoyl-ACP (CoA) reductase from Mycobacterium tuberculosis. , 2004, Chemical communications.

[29]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[30]  Wei-Yin Loh,et al.  Tree‐structured classifiers , 2010 .

[31]  Holger Gohlke,et al.  Target flexibility: an emerging consideration in drug discovery and design. , 2008, Journal of medicinal chemistry.

[32]  Osmar Norberto de Souza,et al.  Discretization of Flexible-Receptor Docking Data , 2010, BSB.

[33]  Duncan Dubugras Alcoba Ruiz,et al.  Association Rules to Identify Receptor and Ligand Structures through Named Entities Recognition , 2010, IEA/AIE.

[34]  Mohammed J. Zaki Data Mining In Bioinformatics (Advanced Information and Knowledge Processing) , 2004 .