Analysis of SNP-expression association matrices

High throughput expression profiling and genotyping technologies provide the means to study the genetic determinants of population variation in gene expression variation. In this paper we present a general statistical framework for the simultaneous analysis of gene expression data and SNP genotype data measured for the same cohort. The framework consists of methods to associate transcripts with SNPs affecting their expression, algorithms to detect subsets of transcripts that share significantly many associations with a subset of SNPs, and methods to visualize the identified relations. We apply our framework to SNP-expression data collected from 49 breast cancer patients. Our results demonstrate an overabundance of transcript-SNP associations in this data, and pinpoint SNPs that are potential master regulators of transcription. We also identify several statistically significant transcript-subsets with common putative regulators that fall into well-defined functional categories.

[1]  Joshua T. Burdick,et al.  Gene expression phenotype in heterozygous carriers of ataxia telangiectasia. , 2002, American journal of human genetics.

[2]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[4]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Nir Friedman,et al.  Class discovery in gene expression data , 2001, RECOMB.

[6]  A. Tsalenko,et al.  Genetic variation in putative regulatory loci controlling gene expression in breast cancer , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Ringnér,et al.  Molecular classification of familial non-BRCA1/BRCA2 breast cancer , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Russell D. Wolfinger,et al.  The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster , 2001, Nature Genetics.

[9]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[10]  L. Wodicka,et al.  Regional and strain-specific gene expression mapping in the adult mouse brain. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[12]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[13]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Zohar Yakhini,et al.  Methods for Analysis and Visualization of SNP Genotype Data for Complex Diseases , 2002, Pacific Symposium on Biocomputing.

[15]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[16]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[17]  George Matcuk,et al.  Identification of endothelial cell genes by combined database mining and microarray analysis. , 2003, Physiological genomics.

[18]  Feng Gao,et al.  Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data , 2004, BMC Bioinformatics.

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  Euan A Ashley,et al.  Novel Role for the Potent Endogenous Inotrope Apelin in Human Cardiac Dysfunction , 2003, Circulation.