A comprehensive automated computer-aided discovery pipeline from genomes to hit molecules

Abstract Big data generation through sequencing of genomes and proteomes has led to over thousands of whole genomes and millions of protein sequences. However, utilization of this data to generate drug-like molecules for curing diseases remains a challenge. We propose here, Dhanvantari, a comprehensive software suite which automates the journey from genomes to hit molecules via its various modules such as i) gene finding, ii) computational structural study of target proteins and iii) virtual screening/identification of hit molecules for computer aided drug discovery. The pipeline has five possible entry points. Validation of the protocol is performed on 10 major life-threatening diseases reported by WHO covering 33 different protein targets and 111 FDA approved drugs. Three complete case studies from genomes to hits are also performed using the pipeline where the reported FDA drugs against the pathogen have been successfully recovered. The proposed software suite promises to deliver some new opportunities and insights into “genome-based” drug discovery along with the classical structure-based approaches to discover novel potential drug-like molecules. The entire protocol requires ~6–12 h. Individual steps, however, get completed within a few minutes. The pipeline can be freely accessed at http://www.scfbio-iitd.res.in/software/dhanvantari_new/Home.html .

[1]  Michael J E Sternberg,et al.  The Phyre2 web portal for protein modeling, prediction and analysis , 2015, Nature Protocols.

[2]  Sohini Ramachandran,et al.  Global rise in human infectious disease outbreaks , 2014, Journal of The Royal Society Interface.

[3]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[4]  B Jayaram,et al.  Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. , 2013, Biochimica et biophysica acta.

[5]  Rahul Kaushik,et al.  Where Informatics Lags Chemistry Leads. , 2017, Biochemistry.

[6]  Krister Wennerberg,et al.  Interactive visual analysis of drug-target interaction networks using Drug Target Profiler, with applications to precision medicine and drug repurposing , 2018, Briefings Bioinform..

[7]  Sean Ekins,et al.  Exploiting machine learning for end-to-end drug discovery and development , 2019, Nature Materials.

[8]  Bhyravabhotla Jayaram,et al.  Nonsteroidal estrogen receptor isoform‐selective biphenyls , 2018, Chemical biology & drug design.

[9]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[10]  B Jayaram,et al.  Backbones of folded proteins reveal novel invariant amino acid neighborhoods. , 2011, Journal of biomolecular structure & dynamics.

[11]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[12]  Tarun Jain,et al.  Computational protocol for predicting the binding affinities of zinc containing metalloprotein–ligand complexes , 2007, Proteins.

[13]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[14]  Anders Krogh,et al.  EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance , 2003, BMC Bioinformatics.

[15]  Santosh A. Khedkar,et al.  Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. , 2010, Current topics in medicinal chemistry.

[16]  Poonam Singhal,et al.  Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. , 2008, Biophysical journal.

[17]  Garima Khandelwal,et al.  DNA-water interactions distinguish messenger RNA genes from transfer RNA genes. , 2012, Journal of the American Chemical Society.

[18]  Sharmistha Dey,et al.  Curcumin specifically binds to the human calcium–calmodulin-dependent protein kinase IV: fluorescence and molecular dynamics simulation studies , 2016, Journal of biomolecular structure & dynamics.

[19]  Bharat Lakhani,et al.  Bhageerath—Targeting the near impossible: Pushing the frontiers of atomic models for protein tertiary structure prediction# , 2012, Journal of Chemical Sciences.

[20]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[21]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Asheesh Shanker,et al.  ProTSAV: A protein tertiary structure analysis and validation server. , 2016, Biochimica et biophysica acta.

[24]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[25]  B Jayaram,et al.  ProRegIn: A regularity index for the selection of native-like tertiary structures of proteins , 2007, Journal of Biosciences.

[26]  G. Marshall Computer-aided drug design. , 1987, Annual review of pharmacology and toxicology.

[27]  B. Jayaram,et al.  A rapid identification of hit molecules for target proteins via physico-chemical descriptors. , 2013, Physical chemistry chemical physics : PCCP.

[28]  Bhyravabhotla Jayaram,et al.  Toward development of generic inhibitors against the 3C proteases of picornaviruses , 2018, The FEBS journal.

[29]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[30]  Tarun Jain,et al.  An all atom energy based computational protocol for predicting binding affinities of protein–ligand complexes , 2005, FEBS letters.

[31]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[32]  Shashank Shekhar,et al.  Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins , 2014, BMC Bioinformatics.

[33]  Nidhi Arora,et al.  Strength of hydrogen bonds in α helices , 1997 .

[34]  Eric P. Nawrocki,et al.  NCBI prokaryotic genome annotation pipeline , 2016, Nucleic acids research.

[35]  Kristen Fortney,et al.  Drug Repurposing Using Deep Embeddings of Gene Expression Profiles. , 2018, Molecular pharmaceutics.

[36]  Marco Biasini,et al.  SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information , 2014, Nucleic Acids Res..

[37]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[38]  Narmada Thanki,et al.  CDD: NCBI's conserved domain database , 2014, Nucleic Acids Res..

[39]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[40]  D. M. Ryan,et al.  Rational design of potent sialidase-based inhibitors of influenza virus replication , 1993, Nature.

[41]  D. Matthews,et al.  Structure-based design, synthesis, and biological evaluation of irreversible human rhinovirus 3C protease inhibitors. 2. Peptide structure-activity studies. , 1998, Journal of medicinal chemistry.

[42]  Rahul Kaushik,et al.  Structural difficulty index: a reliable measure for modelability of protein tertiary structures. , 2016, Protein engineering, design & selection : PEDS.

[43]  B Jayaram,et al.  Protein Structure Evaluation using an All-Atom Energy Based Empirical Scoring Function , 2006, Journal of biomolecular structure & dynamics.

[44]  Debashish Sahu,et al.  Bhageerath: an energy based web enabled computer software suite for limiting the search space of tertiary structures of small globular proteins , 2006, Nucleic acids research.

[45]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[46]  Rahul Kaushik,et al.  From Ramachandran Maps to Tertiary Structures of Proteins. , 2015, The journal of physical chemistry. B.

[47]  Cole H. Christie,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2018, Nucleic acids research.

[48]  Joo Chuan Tong,et al.  Recent advances in computer-aided drug design , 2009, Briefings Bioinform..

[49]  B Jayaram,et al.  A statistical thermodynamic model for investigating the stability of DNA sequences from oligonucleotides to genomes. , 2014, Biophysical journal.

[50]  Garima Khandelwal,et al.  Physico-chemical fingerprinting of RNA genes , 2016, Nucleic acids research.

[51]  J. Condra,et al.  Clinically effective HIV-1 protease inhibitors , 1997 .

[52]  D W Cushman,et al.  Design of potent competitive inhibitors of angiotensin-converting enzyme. Carboxyalkanoyl and mercaptoalkanoyl amino acids. , 1977, Biochemistry.

[53]  R W Piepho,et al.  Overview of the angiotensin-converting-enzyme inhibitors. , 2000, American journal of health-system pharmacy : AJHP : official journal of the American Society of Health-System Pharmacists.

[54]  Tanya Singh,et al.  AADS - An Automated Active Site Identification, Docking, and Scoring Protocol for Protein Targets Based on Physicochemical Descriptors , 2011, J. Chem. Inf. Model..

[55]  Pankaj Sharma,et al.  ParDOCK: an all atom energy based Monte Carlo docking protocol for protein-ligand complexes. , 2007, Protein and peptide letters.

[56]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[57]  B Jayaram,et al.  D2N: Distance to the native. , 2014, Biochimica et biophysica acta.

[58]  B Jayaram,et al.  A computational pathway for bracketing native-like structures fo small alpha helical globular proteins. , 2005, Physical chemistry chemical physics : PCCP.

[59]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[60]  Bhyravabhotla Jayaram,et al.  A Physicochemical Model for Analyzing DNA Sequences , 2006, J. Chem. Inf. Model..

[61]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[62]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[63]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[64]  Xiangxiang Zeng,et al.  Target identification among known drugs by deep learning from heterogeneous networks , 2020, Chemical science.

[65]  J J Baldwin,et al.  Application of the three-dimensional structures of protein target molecules in structure-based drug design. , 1994, Journal of medicinal chemistry.

[66]  B. Jayaram,et al.  A Binding Affinity Based Computational Pathway for Active-Site Directed Lead Molecule Design: Some Promises and Perspectives , 2005 .