Combinatorial Genetic Regulatory Network Analysis Tools for High Throughput Transcriptomic Data

A series of genome-scale algorithms and high-performance implementations is described and shown to be useful in the genetic analysis of gene transcription. With them it is possible to address common questions such as: "are the sets of genes co-expressed under one type of conditions the same as those sets co-expressed under another?" A new noise-adaptive graph algorithm, dubbed "paraclique," is introduced and analyzed for use in biological hypotheses testing. A notion of vertex coverage is also devised, based on vertex-disjoint paths within correlation graphs, and used to determine the identity, proportion and number of transcripts connected to individual phenotypes and quantitative trait loci (QTL) regulatory models. A major goal is to identify which, among a set of candidate genes, are the most likely regulators of trait variation. These methods are applied in an effort to identify multiple-QTL regulatory models for large groups of genetically co-expressed genes, and to extrapolate the consequences of this genetic variation on phenotypes observed across levels of biological scale through the evaluation of vertex coverage. This approach is furthermore applied to definitions of homology-based gene sets, and the incorporation of categorical data such as known gene pathways. In all these tasks discrete mathematics and combinatorial algorithms form organizing principles upon which methods and implementations are based.

[1]  Rainer Breitling,et al.  Biologically valid linear factor models of gene expression , 2004, Bioinform..

[2]  Simon Lin,et al.  Methods of microarray data analysis III , 2002 .

[3]  E. Petretto,et al.  Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease , 2005, Nature Genetics.

[4]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[5]  Kenneth F. Manly,et al.  Overview of QTL mapping software and introduction to Map Manager QT , 1999, Mammalian Genome.

[6]  Philippe Marin,et al.  The Serotonin 5-HT2A and 5-HT2C Receptors Interact with Specific Sets of PDZ Proteins* , 2004, Journal of Biological Chemistry.

[7]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[8]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[9]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R. Doerge Multifactorial genetics: Mapping and analysis of quantitative trait loci in experimental populations , 2002, Nature Reviews Genetics.

[11]  Lu Lu,et al.  WebQTL: rapid exploratory analysis of gene expression and genetic networks for brain and behavior , 2004, Nature Neuroscience.

[12]  Robert W. Williams,et al.  A new set of BXD recombinant inbred lines from advanced intercross populations in mice , 2004, BMC Genetics.

[13]  Abdelghani Bellaachia,et al.  E-CAST: A Data Mining Algorithm for Gene Expression Data , 2002, BIOKDD.

[14]  Robert W. Williams,et al.  Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function , 2005, Nature Genetics.

[15]  Karl W. Broman,et al.  A model selection approach for the identification of quantitative trait loci in experimental crosses , 2002 .

[16]  Robert W. Williams,et al.  Brain gene expression: genomics and genetics. , 2004, International review of neurobiology.

[17]  Michael A. Langston,et al.  Innovative computational methods for transcriptomic data analysis , 2006, SAC.

[18]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[19]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[20]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Faisal N. Abu-Khzam,et al.  Scalable Parallel Algorithms for FPT Problems , 2006, Algorithmica.

[22]  Michael A. Langston,et al.  Computational, Integrative, and Comparative Methods for the Elucidation of Genetic Coexpression Networks , 2005, Journal of biomedicine & biotechnology.

[23]  Fabrizio Grandoni,et al.  Refined memorization for vertex cover , 2005, Inf. Process. Lett..

[24]  T. Südhof,et al.  A Tripartite Protein Complex with the Potential to Couple Synaptic Vesicle Exocytosis to Cell Adhesion in Brain , 1998, Cell.

[25]  Nagiza F. Samatova,et al.  Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[26]  A. Wagner Distributed robustness versus redundancy as causes of mutational robustness. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[27]  P. Pardalos,et al.  Handbook of Combinatorial Optimization , 1998 .

[28]  Nengjun Yi,et al.  The Collaborative Cross, a community resource for the genetic analysis of complex traits , 2004, Nature Genetics.

[29]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[30]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[31]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[32]  Panos M. Pardalos,et al.  The maximum clique problem , 1994, J. Glob. Optim..

[33]  Fabrizio Grandoni,et al.  Refined Memorisation for Vertex Cover , 2004, IWPEC.

[34]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[35]  Michael R. Fellows,et al.  Kernelization Algorithms for the Vertex Cover Problem: Theory and Experiments , 2004, ALENEX/ANALC.

[36]  M. Amalric,et al.  Down-regulation of striatin, a neuronal calmodulin-binding protein, impairs rat locomotor activity. , 1999, Journal of neurobiology.

[37]  Margit Burmeister,et al.  Genetical genomics: combining genetics with gene expression analysis. , 2005, Human molecular genetics.

[38]  Andrew I Su,et al.  Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics' , 2005, Nature Genetics.

[39]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[40]  Lan Lin,et al.  A Combinatorial Approach to the Analysis of Differential Gene Expression Data , 2005 .

[41]  D. Gardner Neurodatabase.org: networking the microelectrode , 2004, Nature Neuroscience.

[42]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Jintao Wang,et al.  Genetic correlates of gene expression in recombinant inbred strains , 2007, Neuroinformatics.