Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations

To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

[1]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[2]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[3]  S. L. Wong,et al.  Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network , 2005, Journal of biology.

[4]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[5]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[6]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[7]  Ioannis Xenarios,et al.  Mining literature for protein-protein interactions , 2001, Bioinform..

[8]  Robert W. Williams,et al.  Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function , 2005, Nature Genetics.

[9]  Z B Zeng,et al.  Multiple trait analysis of genetic mapping for quantitative trait loci. , 1995, Genetics.

[10]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[15]  Andrew I Su,et al.  Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics' , 2005, Nature Genetics.

[16]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[17]  Albert-László Barabási,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002 .

[18]  B. Kholodenko,et al.  Quantification of Short Term Signaling by the Epidermal Growth Factor Receptor* , 1999, The Journal of Biological Chemistry.

[19]  C. Cockerham,et al.  An Extension of the Concept of Partitioning Hereditary Variance for Analysis of Covariances among Relatives When Epistasis Is Present. , 1954, Genetics.

[20]  John D. Storey,et al.  Multiple Locus Linkage Analysis of Genomewide Expression in Yeast , 2005, PLoS biology.

[21]  J. Nap,et al.  Genetical genomics: the added value from segregation. , 2001, Trends in genetics : TIG.

[22]  L Kruglyak,et al.  A nonparametric approach for mapping quantitative trait loci. , 1995, Genetics.

[23]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[24]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[25]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[26]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[27]  E E Schadt,et al.  Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits , 2005, Nature Genetics.

[28]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[29]  Stephen W. Edwards,et al.  Microarray Standard Data Set and Figures of Merit for Comparing Data Processing Methods and Experiment Designs , 2003, Bioinform..

[30]  J. Zhu,et al.  An integrative genomics approach to the reconstruction of gene networks in segregating populations , 2004, Cytogenetic and Genome Research.

[31]  David Page,et al.  Modelling regulatory pathways in E. coli from time series expression profiles , 2002, ISMB.

[32]  E. Schadt,et al.  Genetic inheritance of gene expression in human cell lines. , 2004, American journal of human genetics.

[33]  E. Schadt,et al.  Genetic and Genomic Analysis of a Fat Mass Trait with Complex Inheritance Reveals Marked Sex Specificity , 2006, PLoS genetics.

[34]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[35]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[36]  Leonid Kruglyak,et al.  Local Regulatory Variation in Saccharomyces cerevisiae , 2005, PLoS genetics.

[37]  Keith Shockley,et al.  Structural Model Analysis of Multiple Quantitative Traits , 2006, PLoS genetics.

[38]  J. Lamb,et al.  Elucidating the murine brain transcriptional network in a segregating mouse population to identify core functional modules for obesity and diabetes , 2006, Journal of neurochemistry.

[39]  Manjunatha Jagalur,et al.  Causal inference of regulator-target pairs by gene mapping of expression phenotypes , 2005, BMC Genomics.

[40]  Eric E Schadt,et al.  Integrating QTL and high-density SNP analyses in mice to identify Insig2 as a susceptibility gene for plasma cholesterol levels. , 2005, Genomics.

[41]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[42]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[43]  Eric E Schadt,et al.  Cis-acting expression quantitative trait loci in mice. , 2005, Genome research.