Fast Wavelet Based Functional Models for Transcriptome Analysis with Tiling Arrays

For a better understanding of the biology of an organism, a complete description is needed of all regions of the genome that are actively transcribed. Tiling arrays are used for this purpose. They allow for the discovery of novel transcripts and the assessment of differential expression between two or more experimental conditions such as genotype, treatment, tissue, etc. In tiling array literature, many efforts are devoted to transcript discovery, whereas more recent developments also focus on differential expression. To our knowledge, however, no methods for tiling arrays have been described that can simultaneously assess transcript discovery and identify differentially expressed transcripts. In this paper, we adopt wavelet based functional models to the context of tiling arrays. The high dimensionality of the data triggered us to avoid inference based on Bayesian MCMC methods. Instead, we introduce a fast empirical Bayes method that provides adaptive regularization of the functional effects. A simulation study and a case study illustrate that our approach is well suited for the simultaneous assessment of transcript discovery and differential expression in tiling array studies, and that it outperforms methods that accomplish only one of these tasks.

[1]  Gordon K. Smyth,et al.  Testing significance relative to a fold-change threshold is a TREAT , 2009, Bioinform..

[2]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[3]  I. Johnstone,et al.  Wavelet Threshold Estimators for Data with Correlated Noise , 1997 .

[4]  J. O. Ramsay,et al.  Functional Data Analysis (Springer Series in Statistics) , 1997 .

[5]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[6]  H. Woo,et al.  Resources for systems biology research , 2006 .

[7]  Xiang-Jun Lu,et al.  Detecting transcriptionally active regions using genomic tiling arrays , 2006, Genome Biology.

[8]  I. Johnstone,et al.  Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.

[9]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[10]  Mark D. Robinson,et al.  FIRMA: a method for detection of alternative splicing from exon array data , 2008, Bioinform..

[11]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[12]  B. Silverman,et al.  Wavelet thresholding via a Bayesian approach , 1998 .

[13]  Mário A. T. Figueiredo,et al.  Wavelet-Based Image Estimation : An Empirical Bayes Approach Using Jeffreys ’ Noninformative Prior , 2001 .

[14]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[15]  Wolfgang Huber,et al.  A high-resolution map of transcription in the yeast genome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Purushottam W. Laud,et al.  On Bayesian Analysis of Generalized Linear Models Using Jeffreys's Prior , 1991 .

[17]  A. Bruce,et al.  Understanding WaveShrink: Variance and bias estimation , 1996 .

[18]  Mark Gerstein,et al.  Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. , 2005, Trends in genetics : TIG.

[19]  Jeffrey S. Morris,et al.  Wavelet‐based functional mixed models , 2006, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[20]  Jeffrey S. Morris,et al.  Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet‐Based Functional Mixed Models , 2008, Biometrics.

[21]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23]  A. Blais,et al.  E2F-associated chromatin modifiers and cell cycle control. , 2007, Current opinion in cell biology.

[24]  Anestis Antoniadis,et al.  Wavelet methods in statistics: Some recent developments and their applications , 2007, 0712.0283.

[25]  R. Kass,et al.  Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models) , 1989 .

[26]  Thomas E. Royce,et al.  Global Identification of Human Transcribed Sequences with Genome Tiling Arrays , 2004, Science.

[27]  Iris Tzafrir,et al.  A Sequence-Based Map of Arabidopsis Genes with Mutant Phenotypes1,212 , 2003, Plant Physiology.

[28]  M. Clyde,et al.  Multiple shrinkage and subset selection in wavelets , 1998 .

[29]  Wolfgang Huber,et al.  Transcript mapping with high-density oligonucleotide tiling arrays , 2006, Bioinform..

[30]  Stuart Barber,et al.  Posterior probability intervals for wavelet thresholding , 2002 .

[31]  G. Zeller,et al.  Quantitative RNA expression analysis with Affymetrix Tiling 1.0R arrays identifies new E2F target genes. , 2009, The Plant journal : for cell and molecular biology.

[32]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  S. Cawley,et al.  Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. , 2004, Genome research.

[34]  Dirk Inzé,et al.  The ins and outs of the plant cell cycle , 2007, Nature Reviews Molecular Cell Biology.

[35]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .