Network inference from multimodal data: A review of approaches from infectious disease transmission

Abstract Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications.

[1]  Xavier Didelot,et al.  Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data , 2014, Molecular biology and evolution.

[2]  E. Holmes,et al.  The role of pathogen genomics in assessing disease transmission , 2015, BMJ : British Medical Journal.

[3]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[4]  Hao Hu,et al.  Extracting transmission networks from phylogeographic data for epidemic and endemic diseases: Ebola virus in Sierra Leone, 2009 H1N1 pandemic influenza and polio in Nigeria , 2015, International health.

[5]  Jacco Wallinga,et al.  Relating Phylogenetic Trees to Transmission Trees of Infectious Disease Outbreaks , 2013, Genetics.

[6]  C. Sing,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. , 1992, Genetics.

[7]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[8]  Steven J. M. Jones,et al.  Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. , 2011, The New England journal of medicine.

[9]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[10]  Christian L. Müller,et al.  Sparse and Compositionally Robust Inference of Microbial Ecological Networks , 2014, PLoS Comput. Biol..

[11]  R. Arbeit,et al.  Molecular epidemiology: application of contemporary techniques to the typing of microorganisms. , 1993, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[12]  Samuel Soubeyrand,et al.  A Bayesian Inference Framework to Reconstruct Transmission Trees Using Epidemiological and Genetic Data , 2012, PLoS Comput. Biol..

[13]  K. Crandall,et al.  TCS: a computer program to estimate gene genealogies , 2000, Molecular ecology.

[14]  N. Lytkin,et al.  A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks. , 2011, Genomics.

[15]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[16]  D. Roukos Novel clinico–genome network modeling for revolutionizing genotype–phenotype-based personalized cancer care , 2010, Expert Review of Molecular Diagnostics.

[17]  T Jombart,et al.  Reconstructing disease outbreaks from genetic data: a graph approach , 2010, Heredity.

[18]  Samuel Soubeyrand,et al.  A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data , 2014, Proceedings of the Royal Society B: Biological Sciences.

[19]  F. Carrat,et al.  A 'small-world-like' model for comparing interventions aimed at preventing and controlling influenza pandemics , 2006, BMC medicine.

[20]  Rumi Chunara,et al.  Why We Need Crowdsourced Data in Infectious Disease Surveillance , 2013, Current Infectious Disease Reports.

[21]  Hao Hu,et al.  Extracting transmission networks from phylogeographic data for epidemic and endemic diseases: Ebola virus in Sierra Leone, 2009 H1N1 pandemic influenza and polio in Nigeria , 2015, International health.

[22]  C. Fraser,et al.  Reducing the impact of the next influenza pandemic using household-based public health interventions. , 2006, Hong Kong medical journal = Xianggang yi xue za zhi.

[23]  Theo Geisel,et al.  Model-Free Reconstruction of Excitatory Neuronal Connectivity from Calcium Imaging Signals , 2012, PLoS Comput. Biol..

[24]  M. Kimura Estimation of evolutionary distances between homologous nucleotide sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Vince D. Calhoun,et al.  A review of multivariate methods for multimodal fusion of brain imaging data , 2012, Journal of Neuroscience Methods.

[26]  Thibaut Jombart,et al.  outbreaker2: Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data , 2018 .

[27]  Daniel T. Haydon,et al.  Molecular Epidemiology of the Foot-and-Mouth Disease Virus Outbreak in the United Kingdom in 2001 , 2006, Journal of Virology.

[28]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[29]  Tao Zhang,et al.  Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003 , 2004, BMC infectious diseases.

[30]  J. Suykens,et al.  A kernel-based integration of genome-wide data for clinical decision support , 2009, Genome Medicine.

[31]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[32]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[33]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[34]  Trevor Bedford,et al.  Eight challenges in phylodynamic inference , 2015, Epidemics.

[35]  Tamer Kahveci,et al.  Metabolic network alignment in large scale by network compression , 2012, BMC Bioinformatics.

[36]  Marylyn D. Ritchie,et al.  The foundation of precision medicine: integration of electronic health records with genomics through basic, clinical, and translational research , 2015, Front. Genet..

[37]  Joel H. Saltz,et al.  Integrative, Multimodal Analysis of Glioblastoma Using TCGA Molecular Data, Pathology Images, and Clinical Outcomes , 2011, IEEE Transactions on Biomedical Engineering.

[38]  Juan Liu,et al.  A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules , 2011, Bioinform..

[39]  Colin J. Worby,et al.  The Distribution of Pairwise Genetic Distances: A Tool for Investigating Disease Transmission , 2014, Genetics.

[40]  Mikael Henaff,et al.  Information content and analysis methods for Multi-Modal High-Throughput Biomedical Data , 2014, Scientific Reports.

[41]  Mirjam Kretzschmar,et al.  Infectious disease transmission as a forensic problem: who infected whom? , 2013, Journal of The Royal Society Interface.

[42]  Philip M. Long,et al.  Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection , 2003, The Lancet.

[43]  Giancarlo Raiconi,et al.  MVDA: a multi-view genomic data integration methodology , 2015, BMC Bioinformatics.

[44]  Subha Madhavan,et al.  An informatics research agenda to support precision medicine: seven key areas , 2016, J. Am. Medical Informatics Assoc..

[45]  Samuel Soubeyrand,et al.  OutbreakTools: A new platform for disease outbreak analysis using the R software , 2014, Epidemics.

[46]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[47]  D. Caron,et al.  Marine bacterial, archaeal and protistan association networks reveal ecological linkages , 2011, The ISME Journal.

[48]  Timothy B. Stockwell,et al.  Extensive Geographical Mixing of 2009 Human H1N1 Influenza A Virus in a Single University Community , 2011, Journal of Virology.

[49]  Rumi Chunara,et al.  Surveillance of Acute Respiratory Infections Using Community-Submitted Symptoms and Specimens for Molecular Diagnostic Testing , 2015, PLoS currents.

[50]  A. Flahault,et al.  Estimating the impact of school closure on influenza transmission from Sentinel data , 2008, Nature.

[51]  Mark W. Woolrich,et al.  Network modelling methods for FMRI , 2011, NeuroImage.

[52]  Aaron M. Ellison,et al.  Bayesian inference in ecology , 2004 .

[53]  J Wallinga,et al.  Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data , 2012, Proceedings of the Royal Society B: Biological Sciences.

[55]  Marcel Salathé,et al.  Dynamics and Control of Diseases in Networks with Community Structure , 2010, PLoS Comput. Biol..

[56]  Timothy B. Stockwell,et al.  Quantifying influenza virus diversity and transmission in humans , 2016, Nature Genetics.

[57]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[58]  Gaël Thébaud,et al.  Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus , 2008, Proceedings of the Royal Society B: Biological Sciences.

[59]  Thibaut Jombart,et al.  adegenet: a R package for the multivariate analysis of genetic markers , 2008, Bioinform..

[60]  Herbert W. Hethcote,et al.  The Mathematics of Infectious Diseases , 2000, SIAM Rev..

[61]  Feng Luo,et al.  Molecular ecological network analyses , 2012, BMC Bioinformatics.

[62]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[63]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[64]  Alessandro Vespignani,et al.  Modeling the Worldwide Spread of Pandemic Influenza: Baseline Case and Containment Interventions , 2007, PLoS medicine.

[65]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.