MMBGX: a method for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays

Affymetrix has recently developed whole-transcript GeneChips—‘Gene’ and ‘Exon’ arrays—which interrogate exons along the length of each gene. Although each probe on these arrays is intended to hybridize perfectly to only one transcriptional target, many probes match multiple transcripts located in different parts of the genome or alternative isoforms of the same gene. Existing statistical methods for estimating expression do not take this into account and are thus prone to producing inflated estimates. We propose a method, Multi-Mapping Bayesian Gene eXpression (MMBGX), which disaggregates the signal at ‘multi-match’ probes. When applied to Gene arrays, MMBGX removes the upward bias of gene-level expression estimates. When applied to Exon arrays, it can further disaggregate the signal between alternative transcripts of the same gene, providing expression estimates of individual splice variants. We demonstrate the performance of MMBGX on simulated data and a tissue mixture data set. We then show that MMBGX can estimate the expression of alternative isoforms within one experimental condition, confirming our results by RT-PCR. Finally, we show that our method for detecting differential splicing has a lower error rate than standard exon-level approaches on a previously validated colon cancer data set.

[1]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[2]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[3]  Liang Chen,et al.  A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level , 2009, Nucleic acids research.

[4]  Marit Holden,et al.  Genome-wide estimation of transcript concentrations from spotted cDNA microarray data , 2005, Nucleic acids research.

[5]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..

[6]  Anne-Mette K. Hein,et al.  BGX: a Bioconductor package for the Bayesian integrated analysis of Affymetrix GeneChips , 2007, BMC Bioinformatics.

[7]  John D. Storey A direct approach to false discovery rates , 2002 .

[8]  Guide to Probe Logarithmic Intensity Error ( PLIER ) Estimation , 2005 .

[9]  Tyson A. Clark,et al.  Discovery of tissue-specific exons using comprehensive human exon microarrays , 2007, Genome Biology.

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[11]  Michal J. Okoniewski,et al.  X:Map: annotation and visualization of genome structure for Affymetrix exon array analysis , 2007, Nucleic Acids Res..

[12]  Michael B. Stadler,et al.  Overestimation of alternative splicing caused by variable probe characteristics in exon arrays , 2009, Nucleic acids research.

[13]  M. Garcia-Blanco,et al.  Alternative splicing in disease and therapy , 2004, Nature Biotechnology.

[14]  Mark D. Robinson,et al.  FIRMA: a method for detection of alternative splicing from exon array data , 2008, Bioinform..

[15]  Felix Naef,et al.  Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Anne-Mette K. Hein,et al.  A powerful method for detecting differentially expressed genes from GeneChip arrays that does not require replicates , 2006, BMC Bioinformatics.

[17]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[18]  Alternative Transcript Analysis Methods for Exon Arrays , 2005 .

[19]  Anne-Mette K. Hein,et al.  BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. , 2005, Biostatistics.

[20]  David Haussler,et al.  Gene structure-based splice variant deconvolution using a microarry platform , 2003, ISMB.

[21]  Brendan J. Frey,et al.  Inferring global levels of alternative splicing isoforms using a generative model of microarray data , 2006, Bioinform..

[22]  Tyson A. Clark,et al.  Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array , 2006, BMC Genomics.