Data analysis issues for allele-specific expression using Illumina's GoldenGate assay

BackgroundHigh-throughput measurement of allele-specific expression (ASE) is a relatively new and exciting application area for array-based technologies. In this paper, we explore several data sets which make use of Illumina's GoldenGate BeadArray technology to measure ASE. This platform exploits coding SNPs to obtain relative expression measurements for alleles at approximately 1500 positions in the genome.ResultsWe analyze data from a mixture experiment where genomic DNA samples from pairs of individuals of known genotypes are pooled to create allelic imbalances at varying levels for the majority of SNPs on the array. We observe that GoldenGate has less sensitivity at detecting subtle allelic imbalances (around 1.3 fold) compared to extreme imbalances, and note the benefit of applying local background correction to the data. Analysis of data from a dye-swap control experiment allowed us to quantify dye-bias, which can be reduced considerably by careful normalization. The need to filter the data before carrying out further downstream analysis to remove non-responding probes, which show either weak, or non-specific signal for each allele, was also demonstrated. Throughout this paper, we find that a linear model analysis of the data from each SNP is a flexible modelling strategy that allows for testing of allelic imbalances in each sample when replicate hybridizations are available.ConclusionsOur analysis shows that local background correction carried out by Illumina's software, together with quantile normalization of the red and green channels within each array, provides optimal performance in terms of false positive rates. In addition, we strongly encourage intensity-based filtering to remove SNPs which only measure non-specific signal. We anticipate that a similar analysis strategy will prove useful when quantifying ASE on Illumina's higher density Infinium BeadChips.

[1]  Simon Tavaré,et al.  beadarray: R classes and methods for Illumina bead-based data , 2007, Bioinform..

[2]  Gordon K. Smyth,et al.  A comparison of background correction methods for two-colour microarrays , 2007, Bioinform..

[3]  P. Buckland Allele-specific gene expression differences in humans. , 2004, Human molecular genetics.

[4]  Z. Zuo,et al.  Allele-specific silencing of Alzheimer's disease genes: the amyloid precursor protein genes with Swedish or London mutations. , 2006, Gene.

[5]  J. Knight,et al.  Allele-specific gene expression uncovered. , 2004, Trends in genetics : TIG.

[6]  K. Buetow,et al.  Allelic variation in gene expression is common in the human genome. , 2003, Genome research.

[7]  D. Cox,et al.  Analysis of allelic differential expression in human white blood cells. , 2006, Genome research.

[8]  William E. Copeland,et al.  Department of Obstetrics and Gynecology , 1893, Texas medical journal.

[9]  Jean-Jacques Daudin,et al.  Evaluation of the gene-specific dye bias in cDNA microarray experiments , 2005, Bioinform..

[10]  Bradley J. Main,et al.  Allele-specific expression assays using Solexa , 2009, BMC Genomics.

[11]  Thomas J. Hudson,et al.  Differential Allelic Expression in the Human Genome: A Robust Approach To Identify Genetic and Epigenetic Cis-Acting Mechanisms Regulating Gene Expression , 2008, PLoS genetics.

[12]  Robert D. Finn,et al.  Modifier Effects between Regulatory and Protein-Coding Variation , 2008, PLoS genetics.

[13]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[14]  Alison P. Klein,et al.  Allele-specific expression in the germline of patients with familial pancreatic cancer: An unbiased approach to cancer gene discovery , 2008, Cancer biology & therapy.

[15]  B. Ponder,et al.  Allele-Specific Up-Regulation of FGFR2 Increases Susceptibility to Breast Cancer , 2008, PLoS biology.

[16]  Simon Tavaré,et al.  Statistical issues in the analysis of Illumina data , 2008, BMC Bioinformatics.

[17]  Maxwell P. Lee,et al.  Genome-wide analysis of allele-specific gene expression using oligo microarrays. , 2005, Methods in molecular biology.

[18]  Jehyuk Lee,et al.  Digital RNA Allelotyping Reveals Tissue-specific and Allele-specific Gene Expression in Human , 2009, Nature Methods.

[19]  E. Burright,et al.  Identification and allele-specific silencing of the mutant huntingtin allele in Huntington's disease patient-derived fibroblasts. , 2008, Human gene therapy.

[20]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[21]  Cisca Wijmenga,et al.  High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays. , 2007, Genome research.

[22]  D Bentley,et al.  Highly parallel SNP genotyping. , 2003, Cold Spring Harbor symposia on quantitative biology.

[23]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[24]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[25]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[26]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[27]  BMC Bioinformatics , 2005 .

[28]  John N. Hutchinson,et al.  Widespread Monoallelic Expression on Human Autosomes , 2007, Science.

[29]  K. K. Dobbin,et al.  Characterizing dye bias in microarray experiments , 2005, Bioinform..

[30]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[31]  Johan Staaf,et al.  Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios , 2008, BMC Bioinformatics.

[32]  A. Feinberg,et al.  SNP-specific array-based allele-specific expression analysis. , 2008, Genome research.

[33]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[34]  Matthew E. Ritchie,et al.  High-throughput analysis of candidate imprinted genes and allele-specific gene expression in the human term placenta , 2010, BMC Genetics.