A computational framework for the analysis of multi-species microarray data

In this thesis I present algorithms for the analysis of microarray expression data from multiple species. These algorithms are used to identify core genes in two biological systems, the cell cycle and the immune response. With data generated from high throughput biological experiments, it is now becoming possible to study organisms at the systems level. One of the first questions facing researchers is the identification of the core components of biological subsystems within an organism. This task is made difficult by the high levels of experimental and biological noise associated with these experiments. To address these problems I introduce a new computational framework for combining data from multiple species, for both improving prediction accuracy and identifying important subsets of genes involved in a given system. The computational framework is based on Markov random fields which allow the integration of microarray and sequence data from multiple species. Applying this framework to study cell cycle regulated genes, I have identified genes representing the core machinery of the cell cycle. These findings are supported by both complementary high-throughput data and motif analysis. In addition, I apply this computational framework to study immune response in human and mouse. I show that by using Gaussian random fields instead of discrete Markov random fields we are able to achieve better accuracy in predicting immune response genes. Finally, we identify a list of immune response genes that are conserved between cell types and species for further experimental study.

[1]  L. Duret,et al.  Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. , 1993, Nucleic acids research.

[2]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[3]  Stanley Falkow,et al.  Host microarray analysis reveals a role for the Salmonella response regulator phoP in human macrophage cell death , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[5]  Raffaele Fronza,et al.  Global alterations in mRNA polysomal recruitment in a cell model of colorectal cancer progression to metastasis. , 2006, Carcinogenesis.

[6]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[7]  C. Rao,et al.  Control, exploitation and tolerance of intracellular noise , 2002, Nature.

[8]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[9]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[10]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[11]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[12]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[13]  R. Durbin,et al.  The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics , 2003, PLoS biology.

[14]  A. Aderem,et al.  Toll-like receptors in the induction of the innate immune response , 2000, Nature.

[15]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[16]  S. Bergmann,et al.  Similarities and Differences in Genome-Wide Expression Data of Six Organisms , 2003, PLoS biology.

[17]  T. Volkert,et al.  E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. , 2002, Genes & development.

[18]  Hilbert J. Kappen,et al.  Sufficient Conditions for Convergence of the Sum–Product Algorithm , 2005, IEEE Transactions on Information Theory.

[19]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[20]  G. Puzo,et al.  Differential induction of apoptosis by virulent Mycobacterium tuberculosis in resistant and susceptible murine macrophages: role of nitric oxide and mycobacterial products. , 1997, Journal of immunology.

[21]  Korbinian Strimmer,et al.  Identifying periodically expressed transcripts in microarray time series data , 2008, Bioinform..

[22]  L. Wong,et al.  Identification of cell cycle-regulated genes in fission yeast. , 2005, Molecular biology of the cell.

[23]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[24]  Yan P. Yuan,et al.  Predicting function: from genes to genomes and back. , 1998, Journal of molecular biology.

[25]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[26]  Bernard de Bono,et al.  IRIS: a database surveying known human immune system genes. , 2005, Genomics.

[27]  I. Simon,et al.  Combined analysis reveals a core set of cycling genes , 2007, Genome Biology.

[28]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[31]  Arturo Zychlinsky,et al.  Pathogen‐induced apoptosis of macrophages: a common end for different pathogenic strategies , 2000, Cellular microbiology.

[32]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[33]  Gerhard Lakemeyer,et al.  Exploring artificial intelligence in the new millennium , 2003 .

[34]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[35]  Marcus J. Grote,et al.  Parallel Preconditioning with Sparse Approximate Inverses , 1997, SIAM J. Sci. Comput..

[36]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[37]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[38]  Kim Nasmyth,et al.  The role of SWI4 and SWI6 in the activity of G1 cyclins in yeast , 1991, Cell.

[39]  William T. Freeman,et al.  Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[41]  J. Palmblad,et al.  Cytokine-induced neutrophil-mediated injury of human endothelial cells. , 1997, Journal of immunology.

[42]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[43]  William Stafford Noble,et al.  The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. , 2006, Genes & development.

[44]  P. Bork,et al.  Co-evolution of transcriptional and post-translational cell-cycle regulation , 2006, Nature.

[45]  Reinhard Hoffmann,et al.  Transcriptional responses of murine macrophages to infection with Yersinia enterocolitica , 2004, Cellular microbiology.

[46]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[47]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[48]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[49]  Ziv Bar-Joseph,et al.  Deconvolving cell cycle expression data with complementary information , 2004, ISMB 2004.

[50]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[51]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[52]  L. Wodicka,et al.  Genome-wide expression monitoring in Saccharomyces cerevisiae , 1997, Nature Biotechnology.

[53]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[54]  Sun-Hee Leem,et al.  Meiotic role of SWI6 in Saccharomyces cerevisiae , 1998, Nucleic Acids Res..

[55]  B. Beutler,et al.  Inferences, questions and possibilities in Toll-like receptor signalling , 2004, Nature.

[56]  Zhaohui S. Qin,et al.  Statistical resynchronization and Bayesian detection of periodically expressed genes. , 2004, Nucleic acids research.

[57]  Reinhard Hoffmann,et al.  Role of strain differences on host resistance and the transcriptional response of macrophages to infection with Yersinia enterocolitica. , 2006, Physiological genomics.

[58]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[59]  John W. Fisher,et al.  Loopy Belief Propagation: Convergence and Effects of Message Errors , 2005, J. Mach. Learn. Res..

[60]  A. van Oudenaarden,et al.  Noise Propagation in Gene Networks , 2005, Science.

[61]  D. Botstein,et al.  A DNA microarray survey of gene expression in normal human tissues , 2005, Genome Biology.

[62]  Sekhar Tatikonda,et al.  Loopy Belief Propogation and Gibbs Measures , 2002, UAI.

[63]  Maria Persico,et al.  Inducible IL-2 production by dendritic cells revealed by global gene expression analysis , 2001, Nature Immunology.

[64]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[65]  Kim Nasmyth,et al.  Cell cycle control of the yeast HO gene: Cis- and Trans-acting regulators , 1987, Cell.

[66]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[67]  VekslerOlga,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001 .

[68]  P. Lio’,et al.  Periodic gene expression program of the fission yeast cell cycle , 2004, Nature Genetics.

[69]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  R. Young,et al.  A common set of gene regulatory networks links metabolism and growth inhibition. , 2004, Molecular cell.

[71]  P. Goodfellow,et al.  DNA microarrays in drug discovery and development , 1999, Nature Genetics.

[72]  Jürg Bähler,et al.  YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms , 2006, Nucleic Acids Res..

[73]  E. Lander,et al.  Human macrophage activation programs induced by bacterial pathogens , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[74]  R. Strieter,et al.  TNF-alpha mediates recruitment of neutrophils and eosinophils during airway inflammation. , 1995, Journal of immunology.

[75]  Ash A. Alizadeh,et al.  Stereotyped and specific gene expression programs in human innate immune responses to bacteria , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[76]  L. Moldawer,et al.  Macrophages secrete a novel heparin-binding protein with inflammatory and neutrophil chemokinetic properties , 1988, The Journal of experimental medicine.

[77]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[78]  Joshua M. Korn,et al.  The plasticity of dendritic cell responses to pathogens and their components. , 2001, Science.

[79]  Martin Vingron,et al.  An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets , 2006, RECOMB.

[80]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[81]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[82]  Aaron Golden,et al.  Transcription factor binding site identification using the self-organizing map , 2005, Bioinform..

[83]  F C Kafatos,et al.  Phylogenetic perspectives in innate immunity. , 1999, Science.

[84]  P. Brown,et al.  A specific gene expression program triggered by Gram-positive bacteria in the cytosol. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[85]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[86]  Joachim M. Buhmann,et al.  Semi-supervised LC/MS alignment for differential proteomics , 2006, ISMB.

[87]  Peter J. Murray,et al.  Shaping Gene Expression in Activated and Resting Primary Macrophages by IL-101 , 2002, The Journal of Immunology.

[88]  D. Draper,et al.  Toll-like receptor 2-dependent and -independent activation of macrophages by group B streptococci. , 2006, Immunology letters.

[89]  R. Black,et al.  Rapid and specific conversion of precursor interleukin 1 beta (IL-1 beta) to an active IL-1 species by human mast cell chymase , 1991, The Journal of experimental medicine.

[90]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[91]  Damien Chaussabel,et al.  Unique gene expression profiles of human macrophages and dendritic cells to phylogenetically distinct parasites. , 2003, Blood.

[92]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[93]  William Bradford,et al.  Expression analysis of the T-cell-targeting chemokines CXCL9 and CXCL10 in mice and humans with endothelial infections caused by rickettsiae of the spotted fever group. , 2003, The American journal of pathology.

[94]  Y. Surdin-Kerjan,et al.  Metabolism of sulfur amino acids in Saccharomyces cerevisiae , 1997, Microbiology and molecular biology reviews : MMBR.

[95]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[96]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[97]  Sunghoon Kim,et al.  Functional expansion of aminoacyl-tRNA synthetases and their interacting factors: new perspectives on housekeepers. , 2005, Trends in biochemical sciences.

[98]  R. Andersson,et al.  NF-kappaB activation and inhibition: a review. , 2002, Shock.

[99]  M. Marton,et al.  Transcriptional Profiling Shows that Gcn4p Is a Master Regulator of Gene Expression during Amino Acid Starvation in Yeast , 2001, Molecular and Cellular Biology.

[100]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[101]  J. Bähler Cell-cycle control of gene expression in budding and fission yeast. , 2005, Annual review of genetics.

[102]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[103]  Joseph R. Ecker,et al.  Auxin response factors ARF6 and ARF8 promote jasmonic acid production and flower maturation , 2005, Development.

[104]  Gareth E. Jones,et al.  Cellular signaling in macrophage migration and chemotaxis , 2000, Journal of leukocyte biology.

[105]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[106]  Adam P. Rosebrock,et al.  The Cell Cycle–Regulated Genes of Schizosaccharomyces pombe , 2005, PLoS biology.

[107]  Shyjan Mahamud,et al.  Comparing Belief Propagation and Graph Cuts for Novelty Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[108]  Ziv Bar-Joseph,et al.  Identifying cycling genes by combining sequence homology and expression data , 2006, ISMB.

[109]  Beatrice Vitali,et al.  Immunomodulatory effects of probiotic bacteria DNA: IL-1 and IL-10 response in human peripheral blood mononuclear cells. , 2003, FEMS immunology and medical microbiology.

[110]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[111]  G. Keller,et al.  Life span of multipotential hematopoietic stem cells in vivo , 1990, The Journal of experimental medicine.

[112]  Yan Wang,et al.  Genome-wide functional analysis of human cell-cycle regulators , 2006, Proceedings of the National Academy of Sciences.

[113]  Aaron Golden,et al.  Improved detection of DNA motifs using a self-organized clustering of familial binding profiles , 2005, ISMB.

[114]  F. Delsuc Comparative Genomics , 2010, Lecture Notes in Computer Science.

[115]  Peter Messmer,et al.  Parallel Implementation of a Sparse Approximate Inverse Preconditioner , 1996, IRREGULAR.

[116]  Olga Veksler,et al.  Markov random fields with efficient approximations , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[117]  R. Allshire,et al.  Fission yeast genes which disrupt mitotic chromosome segregation when overexpressed. , 1996, Nucleic acids research.

[118]  E. Gulbins,et al.  Molecular mechanisms of bacteria induced apoptosis , 2001, Apoptosis.

[119]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[120]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[121]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[122]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[123]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): from genes to mice—a community resource for mouse biology , 2004, Nucleic Acids Res..

[124]  P. Nurse Universal control mechanism regulating onset of M-phase , 1990, Nature.

[125]  I-Min A. Dubchak,et al.  Active conservation of noncoding sequences revealed by three-way species comparisons. , 2000, Genome research.

[126]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[127]  S. Dalton,et al.  Cell cycle-regulated nuclear import and export of Cdc47, a protein essential for initiation of DNA replication in budding yeast. , 1995, Proceedings of the National Academy of Sciences of the United States of America.