Predicting Viral Infection From High-Dimensional Biomarker Trajectories

There is often interest in predicting an individual’s latent health status based on high-dimensional biomarkers that vary over time. Motivated by time-course gene expression array data that we have collected in two influenza challenge studies performed with healthy human volunteers, we develop a novel time-aligned Bayesian dynamic factor analysis methodology. The time course trajectories in the gene expressions are related to a relatively low-dimensional vector of latent factors, which vary dynamically starting at the latent initiation time of infection. Using a nonparametric cure rate model for the latent initiation times, we allow selection of the genes in the viral response pathway, variability among individuals in infection times, and a subset of individuals who are not infected. As we demonstrate using held-out data, this statistical framework allows accurate predictions of infected individuals in advance of the development of clinical symptoms, without labeled data and even when the number of biomarkers vastly exceeds the number of individuals under study. Biological interpretation of several of the inferred pathways (factors) is provided.

[1]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[2]  M. Stephens Dealing with label switching in mixture models , 2000 .

[3]  R. Turner Ineffectiveness of intranasal zinc gluconate for prevention of experimental rhinovirus colds. , 2001, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[4]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Gareth M. James,et al.  Functional linear discriminant analysis for irregularly sampled curves , 2001 .

[6]  J. Brieland,et al.  Comparison of Pathogenesis and Host Immune Responses to Candida glabrata and Candida albicans in Systemically Infected Immunocompetent Mice , 2001, Infection and Immunity.

[7]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[8]  D. Dunson,et al.  Bayesian Modeling of Incidence and Progression of Disease from Cross‐Sectional Data , 2002, Biometrics.

[9]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[10]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[11]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[12]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[13]  D. Dunson,et al.  Bayesian Modeling of Multiple Lesion Onset and Growth from Interval‐Censored Data , 2004, Biometrics.

[14]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[15]  R. Myers,et al.  Gender-Specific Gene Expression in Post-Mortem Human Brain: Localization to Sex Chromosomes , 2004, Neuropsychopharmacology.

[16]  D. Hand,et al.  Bayesian coclustering of Anopheles gene expression time series: study of immune defense response to multiple experimental challenges. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Richard A. Young,et al.  Insights into host responses against pathogens from transcriptional profiling , 2005, Nature Reviews Microbiology.

[18]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[20]  David L. Steffen,et al.  The DNA sequence of the human X chromosome , 2005, Nature.

[21]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[22]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[23]  Neil D. Lawrence,et al.  Modelling transcriptional regulation using Gaussian Processes , 2006, NIPS.

[24]  D. Stephens,et al.  A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes , 2006 .

[25]  Hongzhe Li,et al.  Group SCAD regression analysis for microarray time course gene expression data , 2007, Bioinform..

[26]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[27]  J. Banchereau,et al.  Gene expression patterns in blood leukocytes discriminate patients with acute infections. , 2007, Blood.

[28]  O. Ramilo,et al.  MAVS and MyD88 are essential for innate immunity but not cytotoxic T lymphocyte response against respiratory syncytial virus , 2008, Proceedings of the National Academy of Sciences.

[29]  David Proud,et al.  Gene expression profiles during in vivo human rhinovirus infection: insights into the host response. , 2008, American journal of respiratory and critical care medicine.

[30]  Hal Daumé,et al.  The Infinite Hierarchical Factor Regression Model , 2008, NIPS.

[31]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[32]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[33]  Akiko Iwasaki,et al.  Inflammasome recognition of influenza virus is essential for adaptive immune responses , 2009, The Journal of experimental medicine.

[34]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[35]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[36]  L. Carin,et al.  Gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans. , 2009, Cell host & microbe.

[37]  Friedrich Leisch,et al.  Mixtures of regression models for time course gene expression data: evaluation of initialization and random effects , 2010, Bioinform..

[38]  Padhraic Smyth,et al.  Estimating replicate time shifts using Gaussian process regression , 2010, Bioinform..

[39]  Akiko Iwasaki,et al.  Inflammasomes as mediators of immunity against influenza virus. , 2011, Trends in immunology.