Large-scale data-driven integrative framework for extracting essential targets and processes from disease-associated gene data sets

Populations worldwide currently face several public health challenges, including growing prevalence of infections and the emergence of new pathogenic organisms. The cost and risk associated with drug development make the development of new drugs for several diseases, especially orphan or rare diseases, unappealing to the pharmaceutical industry. Proof of drug safety and efficacy is required before market approval, and rigorous testing makes the drug development process slow, expensive and frequently result in failure. This failure is often because of the use of irrelevant targets identified in the early steps of the drug discovery process, suggesting that target identification and validation are cornerstones for the success of drug discovery and development. Here, we present a large-scale data-driven integrative computational framework to extract essential targets and processes from an existing disease-associated data set and enhance target selection by leveraging drug-target-disease association at the systems level. We applied this framework to tuberculosis and Ebola virus diseases combining heterogeneous data from multiple sources, including protein-protein functional interaction, functional annotation and pharmaceutical data sets. Results obtained demonstrate the effectiveness of the pipeline, leading to the extraction of essential drug targets and to the rational use of existing approved drugs. This provides an opportunity to move toward optimal target-based strategies for screening available drugs and for drug discovery. There is potential for this model to bridge the gap in the production of orphan disease therapies, offering a systematic approach to predict new uses for existing drugs, thereby harnessing their full therapeutic potential.

[1]  Deendayal Dinakarpandian,et al.  Finding disease similarity based on implicit semantic similarity , 2012, J. Biomed. Informatics.

[2]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[3]  Nicola J. Mulder,et al.  DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures , 2013, BMC Bioinformatics.

[4]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[5]  Nicola J. Mulder,et al.  Scoring Protein Relationships in Functional Interaction Networks Predicted from Sequence Data , 2011, PloS one.

[6]  Zhiyong Lu,et al.  A survey of current trends in computational drug repositioning , 2016, Briefings Bioinform..

[7]  Cheng Zhu,et al.  Drug repositioning for orphan diseases , 2011, Briefings Bioinform..

[8]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[9]  T. Ashburn,et al.  Drug repositioning: identifying and developing new uses for existing drugs , 2004, Nature Reviews Drug Discovery.

[10]  Ram Samudrala,et al.  Exploring Polypharmacology in Drug Discovery and Repurposing Using the CANDO Platform. , 2016, Current pharmaceutical design.

[11]  Nicola J. Mulder,et al.  A Topology-Based Metric for Measuring Term Similarity in the Gene Ontology , 2012, Adv. Bioinformatics.

[12]  Nicola J. Mulder,et al.  A web-based protein interaction network visualizer , 2013, BMC Bioinformatics.

[13]  A. Morris,et al.  Genome-wide association analyses identifies a susceptibility locus for tuberculosis on chromosome 18 q 11 . 2 , 2019 .

[14]  Shuhong Zhao,et al.  Candidate Gene Identification Approach: Progress and Challenges , 2007, International journal of biological sciences.

[15]  A. Bauer-Mehren,et al.  Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases , 2011, PloS one.

[16]  Nicola J. Mulder,et al.  Information Content-Based Gene Ontology Functional Similarity Measures: Which One to Use for a Given Biological Data Type? , 2014, PloS one.

[17]  Ping Chen,et al.  Synergistic interactions of SQ109, a new ethylene diamine, with front-line antitubercular drugs in vitro. , 2006, The Journal of antimicrobial chemotherapy.

[18]  E. Rubin,et al.  Genes required for mycobacterial growth defined by high density mutagenesis , 2003, Molecular microbiology.

[19]  Philip E. Bourne,et al.  The Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications , 2010, PLoS Comput. Biol..

[20]  D. Matthews,et al.  Elucidation of the Ebola virus VP24 cellular interactome and disruption of virus biology through targeted inhibition of host-cell protein function. , 2014, Journal of proteome research.

[21]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[22]  Nicola J. Mulder,et al.  Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery , 2016, Briefings Bioinform..

[23]  Joel Dudley,et al.  Exploiting drug-disease relationships for computational drug repositioning , 2011, Briefings Bioinform..

[24]  Nicola J. Mulder,et al.  Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory , 2013, BioMed research international.

[25]  Nicola J. Mulder,et al.  ancGWAS: a post genome-wide association study method for interaction, pathway and ancestry analysis in homogeneous and admixed populations , 2016, Bioinform..

[26]  Chirag J. Patel,et al.  A review of validation strategies for computational drug repositioning , 2016, Briefings Bioinform..

[27]  Nicola J. Mulder,et al.  Predicting and Analyzing Interactions between Mycobacterium tuberculosis and Its Human Host , 2013, PloS one.

[28]  Yi Pan,et al.  Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm , 2016, Bioinform..

[29]  Wei Zheng,et al.  dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks , 2011, Bioinform..

[30]  Christopher M. Sassetti,et al.  Genetic requirements for mycobacterial survival during infection , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  P. Danzon,et al.  The Oxford handbook of the economics of the biopharmaceutical industry , 2012 .

[32]  A. Morris,et al.  Genome-wide association analyses identifies a susceptibility locus for tuberculosis on chromosome 18q11.2 , 2010, Nature Genetics.

[33]  K. Spurgers,et al.  HSPA5 is an essential host factor for Ebola virus infection. , 2014, Antiviral research.

[34]  Chao Wu,et al.  Computational drug repositioning through heterogeneous network clustering , 2013, BMC Systems Biology.

[35]  Alexander C. J. Roth,et al.  Year : 2013 STRING v 9 . 1 : protein-protein interaction networks , with increased coverage and integration , 2017 .

[36]  Nicola J. Mulder,et al.  Generation and Analysis of Large-Scale Data-Driven Mycobacterium tuberculosis Functional Networks for Drug Target Identification , 2011, Adv. Bioinformatics.

[37]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[38]  J. DiMasi,et al.  R&D Costs and Returns to New Drug Development: A Review of the Evidence , 2012 .

[39]  Wei Xu,et al.  Ebola virus VP24 targets a unique NLS binding site on karyopherin alpha 5 to selectively compete with nuclear import of phosphorylated STAT1. , 2014, Cell host & microbe.

[40]  Nicola J. Mulder,et al.  Function Prediction and Analysis of Mycobacterium tuberculosis Hypothetical Proteins , 2012, International journal of molecular sciences.

[41]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[42]  Kalidas Yeturu,et al.  targetTB: A target identification pipeline for Mycobacterium tuberculosis through an interactome, reactome and genome-scale structural analysis , 2008, BMC Systems Biology.

[43]  Nagasuma Chandra,et al.  Computational systems approach for drug target discovery , 2009, Expert opinion on drug discovery.

[44]  N. Mulder,et al.  Using Host-Pathogen Functional Interactions for Filtering Potential Drug Targets in Mycobacterium tuberculosis , 2013 .

[45]  J. Dye,et al.  Ebola virus entry requires the cholesterol transporter Niemann-Pick C1 , 2011, Nature.

[46]  Jean-Loup Guillaume,et al.  Fast unfolding of community hierarchies in large networks , 2008, ArXiv.

[47]  Nicola J. Mulder,et al.  A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool , 2016, Bioinform..

[48]  Nicola J. Mulder,et al.  The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines , 2014, Front. Genet..

[49]  R. Narayanan Ebola-associated genes in the human genome: implications for novel targets , 2014 .

[50]  Rachel S. G. Sealfon,et al.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak , 2014, Science.

[51]  Nicola J. Mulder,et al.  Using biological networks to improve our understanding of infectious diseases , 2014, Computational and structural biotechnology journal.