Automated Discovery of Functional Generality of Human Gene Expression Programs

An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data.

[1]  Brendan J. Frey,et al.  Multi-way clustering of microarray data using probabilistic sparse matrix factorization , 2005, ISMB.

[2]  E. Lander,et al.  Human macrophage activation programs induced by bacterial pathogens , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[4]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[5]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[7]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[10]  Richard A. Young,et al.  Insights into host responses against pathogens from transcriptional profiling , 2005, Nature Reviews Microbiology.

[11]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[12]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Gene Ontology Consortium,et al.  The Gene Ontology (GO) project in 2006 , 2005, Nucleic Acids Res..

[14]  Ramón Díaz-Uriarte,et al.  IDconverter and IDClight: Conversion and annotation of gene and protein IDs , 2007, BMC Bioinformatics.

[15]  S. Falkow,et al.  Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Francisco Tirado,et al.  Modulating the Expression of Disease Genes with RNA-Based Therapy , 2006, BMC Bioinformatics.

[18]  Andrew P. McMahon,et al.  WNT7b mediates macrophage-induced programmed cell death in patterning of the vasculature , 2005, Nature.

[19]  Massimo Triggiani,et al.  Differentiation of monocytes into macrophages induces the upregulation of histamine H1 receptor. , 2007, The Journal of allergy and clinical immunology.

[20]  Daphne Koller,et al.  Probabilistic discovery of overlapping cellular processes and their regulation , 2004, J. Comput. Biol..

[21]  Joshua M. Korn,et al.  The plasticity of dendritic cell responses to pathogens and their components. , 2001, Science.

[22]  Ash A. Alizadeh,et al.  Stereotyped and specific gene expression programs in human innate immune responses to bacteria , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Michael I. Jordan,et al.  A latent variable model for chemogenomic profiling , 2005, Bioinform..

[24]  Philip M. Kim,et al.  Subsystem identification through dimensionality reduction of large-scale gene expression data. , 2003, Genome research.

[25]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[26]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[27]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[28]  Melvin E Andersen,et al.  Theoretical Biology and Medical Modelling Open Access Binary Gene Induction and Protein Expression in Individual Cells , 2022 .

[29]  Giovanna Lucchini,et al.  The Plasticity of Dendritic Cell Responses to Pathogens and Their Components , 2001 .

[30]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Richard G. Jenner,et al.  Coordinated binding of NF-kappaB family members in the response of human cells to lipopolysaccharide. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  D. Hume,et al.  Probability in transcriptional regulation and its implications for leukocyte differentiation and inducible gene expression. , 2000, Blood.

[35]  W. Greene,et al.  Shaping the nuclear action of NF-kappaB. , 2004, Nature reviews. Molecular cell biology.

[36]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[37]  J. W. Little,et al.  Threshold effects in gene regulation: when some is not enough. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[39]  W. Greene,et al.  Shaping the nuclear action of NF-κB , 2004, Nature Reviews Molecular Cell Biology.

[40]  B. Frenkel,et al.  Peripheral cannabinoid receptor, CB2, regulates bone mass. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Patrick M Flood,et al.  Beta2 adrenergic receptor activation stimulates pro-inflammatory cytokine production in macrophages via PKA- and NF-kappaB-independent mechanisms. , 2007, Cellular signalling.

[42]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Gerard J. Nau,et al.  Cumulative Toll-Like Receptor Activation in Human Macrophages Treated with Whole Bacteria1 , 2003, The Journal of Immunology.

[44]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[45]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[46]  W. Birchmeier,et al.  New aspects of Wnt signaling pathways in higher vertebrates. , 2001, Current opinion in genetics & development.

[47]  Holger Heine,et al.  The Wingless homolog WNT5A and its receptor Frizzled-5 regulate inflammatory responses of human mononuclear cells induced by microbial stimulation. , 2006, Blood.

[48]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[49]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[50]  L. Trümper,et al.  Wnt 5a signaling is critical for macrophage-induced invasion of breast cancer cell lines. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[52]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[53]  D. Botstein,et al.  A DNA microarray survey of gene expression in normal human tissues , 2005, Genome Biology.

[54]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[55]  Rafael A. Irizarry,et al.  Stochastic models inspired by hybridization theory for short oligonucleotide arrays , 2004, J. Comput. Biol..

[56]  Wolfgang Kummer,et al.  Expression of nicotinic acetylcholine receptors on murine alveolar macrophages , 2007, Journal of Molecular Neuroscience.

[57]  Ash A. Alizadeh,et al.  Role of interleukin 6 in myocardial dysfunction of meningococcal septic shock , 2004, The Lancet.