Can Zipf's law be adapted to normalize microarrays?

BackgroundNormalization is the process of removing non-biological sources of variation between array experiments. Recent investigations of data in gene expression databases for varying organisms and tissues have shown that the majority of expressed genes exhibit a power-law distribution with an exponent close to -1 (i.e. obey Zipf's law). Based on the observation that our single channel and two channel microarray data sets also followed a power-law distribution, we were motivated to develop a normalization method based on this law, and examine how it compares with existing published techniques. A computationally simple and intuitively appealing technique based on this observation is presented.ResultsUsing pairwise comparisons using MA plots (log ratio vs. log intensity), we compared this novel method to previously published normalization techniques, namely global normalization to the mean, the quantile method, and a variation on the loess normalization method designed specifically for boutique microarrays. Results indicated that, for single channel microarrays, the quantile method was superior with regard to eliminating intensity-dependent effects (banana curves), but Zipf's law normalization does minimize this effect by rotating the data distribution such that the maximal number of data points lie on the zero of the log ratio axis. For two channel boutique microarrays, the Zipf's law normalizations performed as well as, or better than existing techniques.ConclusionZipf's law normalization is a useful tool where the Quantile method cannot be applied, as is the case with microarrays containing functionally specific gene sets (boutique arrays).

[1]  E. Lander Array of hope , 1999, Nature Genetics.

[2]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[3]  Tetsuya Yomo,et al.  Universality and flexibility in gene expression from bacteria to human. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  O. Ogasawara,et al.  Zipf's law and human transcriptomes: an explanation with an evolutionary model. , 2003, Comptes rendus biologies.

[5]  S. Wölfl,et al.  Ranking: a closer look on globalisation methods for normalisation of gene expression arrays. , 2002, Nucleic acids research.

[6]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[7]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[8]  Dale L. Wilson,et al.  New Normalization Methods for CDNA Microarray Data , 2003, Bioinform..

[9]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[10]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[11]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[12]  D. Jones,et al.  Adjustments and measures of differential expression for microarray data , 2002, Bioinform..

[13]  Thomas Lengauer,et al.  Centralization: A biologically sensible method for the normalization of gene expression data , 2001 .

[14]  Thomas Lengauer,et al.  Centralization: a new method for the normalization of gene expression data , 2001, ISMB.

[15]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[16]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[17]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[18]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[19]  V. Kuznetsov,et al.  General statistics of stochastic process of gene expression in eukaryotic cells. , 2002, Genetics.

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  C. Furusawa,et al.  Zipf's law in gene expression. , 2002, Physical review letters.

[22]  C. Li,et al.  Feature extraction and normalization algorithms for high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry. Supplement.

[23]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[24]  G. Landes,et al.  Analysis of human transcriptomes , 1999, Nature Genetics.