Taxonomic propagation of phenotypic features predict host pathogen interactions

Identification of host-pathogen interactions can reveal mechanistic insights of infectious diseases for potential treatments and drug discoveries. Current computational methods focus on the prediction of host–pathogen protein interactions and rely on our knowledge of the sequences and functions of pathogen proteins, which is limited for many species, especially for emerging pathogens. We developed an ontology-based machine learning method that predicts potential interaction protein partners for pathogen taxa. Our method exploits information about infectious disease mechanisms through features learned from phenotypic, functional and taxonomic knowledge about pathogen taxa and human proteins. Additionally, by propagating the phenotypic features of the pathogens within a formal representation of pathogen taxonomy, we demonstrate that our model can also accurately predict interaction protein partners for pathogens even without known phenotypes, using a combination of their taxonomic relationships with other pathogens and information from ontologies as background knowledge. Our results show that the integration of phenotypic, functional and taxonomic knowledge not only improves the prediction performance, but also enables us to investigate novel pathogens in emerging infectious diseases. Author summary Infectious diseases are caused by various types of pathogens, such as bacteria and viruses, and lead to millions of deaths each year, especially in low-income countries. Researchers have been attempting to predict and study possible host-pathogen interactions on a molecular level. Understanding these interactions can shed light on how pathogens invade cells or disrupt the immune system. We propose a novel method to predict such interactions by associating phenotypes (e.g., the signs and symptoms of patients) associated with pathogens and phenotypes associated with human proteins. We are able to accurately predict and prioritize possible protein partners for dozens of pathogens. We further extended the prediction model by relating pathogens without phenotypes with those with phenotypes through their taxonomic relationships. We found that the addition of taxonomic knowledge greatly increased the number of pathogens that we can study, without diminishing the accuracy of the model. To the best of our knowledge, we are the first to predict host-pathogen interactions based on phenotypes and taxonomy. Our work has important implications for new pathogens and emerging infectious diseases that are not yet well-studied.

[1]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[2]  Matthew D. Dyer,et al.  The Landscape of Human Proteins Interacting with Viruses and Other Pathogens , 2008, PLoS pathogens.

[3]  P. Roy,et al.  Disruption of Specific RNA-RNA Interactions in a Double-Stranded RNA Virus Inhibits Genome Packaging and Virus Infectivity , 2015, PLoS pathogens.

[4]  Miguel Ángel Rodríguez-García,et al.  Integrating phenotype ontologies with PhenomeNET , 2016, OM@ISWC.

[5]  Suyu Mei,et al.  Probability Weighted Ensemble Transfer Learning for Predicting Interactions between HIV-1 and Human Proteins , 2013, PloS one.

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  Judith A. Blake,et al.  Mouse Genome Database (MGD)-2018: knowledgebase for the laboratory mouse , 2017, Nucleic Acids Res..

[8]  Xin Gao,et al.  OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction , 2018, Bioinform..

[9]  C. Lilley,et al.  Interactions of viruses with the cellular DNA repair machinery. , 2004, DNA repair.

[10]  T. M. Murali,et al.  The Human-Bacterial Pathogen Protein Interaction Networks of Bacillus anthracis, Francisella tularensis, and Yersinia pestis , 2010, PloS one.

[11]  J. Díez,et al.  The TRPV4 channel links calcium influx to DDX3X activity and viral infectivity , 2018, Nature Communications.

[12]  Hao Zhu,et al.  Computational reconstruction of proteome-wide protein interaction networks between HTLV retroviruses and Homo sapiens , 2014, BMC Bioinformatics.

[13]  Dmitry Korkin,et al.  Literature mining of host-pathogen interactions: comparing feature-based supervised learning and language-based approaches , 2012, Bioinform..

[14]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[15]  R. Hoehndorf,et al.  PathoPhenoDB: linking human pathogens to their disease phenotypes in support of infectious disease research , 2018, bioRxiv.

[16]  Michael J Parker,et al.  Mutations in DDX3X Are a Common Cause of Unexplained Intellectual Disability with Gender-Specific Effects on Wnt Signaling. , 2015, American journal of human genetics.

[17]  Paul N. Schofield,et al.  The role of ontologies in biological and biomedical research: a functional perspective , 2015, Briefings Bioinform..

[18]  Jaime G. Carbonell,et al.  Techniques to cope with missing data in host–pathogen protein interaction prediction , 2012, Bioinform..

[19]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[20]  Jaime G. Carbonell,et al.  Multitask learning for host–pathogen protein interactions , 2013, Bioinform..

[21]  Paul N. Schofield,et al.  The anatomy of phenotype ontologies: principles, properties and applications , 2017, Briefings Bioinform..

[22]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[23]  Paul N. Schofield,et al.  Aber-OWL: a framework for ontology-based data access in biology , 2014, BMC Bioinformatics.

[24]  R. Hai,et al.  Zika virus genome biology and molecular pathogenesis , 2017, Emerging Microbes &Infections.

[25]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[26]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[27]  Bindu Nanduri,et al.  HPIDB 2.0: a curated database for host–pathogen interactions , 2016, Database J. Biol. Databases Curation.

[28]  Robert Hoehndorf,et al.  Mouse genetic and phenotypic resources for human genetics , 2012, Human mutation.

[29]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[30]  Olivier Bodenreider,et al.  The digital revolution in phenotyping , 2015, Briefings Bioinform..

[31]  R. Baldock,et al.  Kaufman’s Atlas of Mouse Development Supplement , 2015 .

[32]  Nigel W. Hardy,et al.  Mouse model phenotypes provide information about human drug targets , 2013, Bioinform..

[33]  Yu Guo,et al.  Prediction of host - pathogen protein interactions between Mycobacterium tuberculosis and Homo sapiens using sequence motifs , 2015, BMC Bioinformatics.

[34]  Nigel W. Hardy,et al.  Systematic Analysis of Experimental Phenotype Data Reveals Gene Functions , 2013, PloS one.

[35]  S. Saha,et al.  Prediction of Interactions between Viral and Host Proteins Using Supervised Machine Learning Methods , 2014, PloS one.

[36]  Ujjwal Maulik,et al.  Incorporating the type and direction information in predicting novel regulatory interactions between HIV-1 and human proteins using a biclustering approach , 2014, BMC Bioinformatics.

[37]  Xin Gao,et al.  Formal axioms in biomedical ontologies improve analysis and interpretation of associated data , 2019, bioRxiv.

[38]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[39]  B. A.,et al.  Disease model discovery from 3,328 gene knockouts by The International Mouse Phenotyping Consortium , 2018, Yearbook of Paediatric Endocrinology.

[40]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[41]  B. Finlay,et al.  Exploitation of mammalian host cell functions by bacterial pathogens. , 1997, Science.

[42]  Farshad Khunjush,et al.  Computational approaches for prediction of pathogen-host protein-protein interactions , 2015, Front. Microbiol..

[43]  Matthew D. Dyer,et al.  Supervised learning and prediction of physical interactions between human and HIV proteins. , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[44]  M. Woolhouse,et al.  Human viruses: discovery and emergence , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[45]  Chun-Ming Chen,et al.  Targeted inactivation of murine Ddx3x: essential roles of Ddx3x in placentation and embryogenesis. , 2016, Human molecular genetics.