Making sense of microarray data distributions

MOTIVATION Typical analysis of microarray data has focused on spot by spot comparisons within a single organism. Less analysis has been done on the comparison of the entire distribution of spot intensities between experiments and between organisms. RESULTS Here we show that mRNA transcription data from a wide range of organisms and measured with a range of experimental platforms show close agreement with Benford's law (Benford, PROC: Am. Phil. Soc., 78, 551-572, 1938) and Zipf's law (Zipf, The Psycho-biology of Language: an Introduction to Dynamic Philology, 1936 and Human Behaviour and the Principle of Least Effort, 1949). The distribution of the bulk of microarray spot intensities is well approximated by a log-normal with the tail of the distribution being closer to power law. The variance, sigma(2), of log spot intensity shows a positive correlation with genome size (in terms of number of genes) and is therefore relatively fixed within some range for a given organism. The measured value of sigma(2) can be significantly smaller than the expected value if the mRNA is extracted from a sample of mixed cell types. Our research demonstrates that useful biological findings may result from analyzing microarray data at the level of entire intensity distributions.

[1]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[2]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[3]  V. Reinke,et al.  A global profile of germline gene expression in C. elegans. , 2000, Molecular cell.

[4]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[5]  E. Montroll,et al.  On 1/f noise and other distributions with long tails. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[7]  A. Michalos,et al.  Readings in Mathematical Social Science , 1968 .

[8]  S. Schwartz,et al.  On the distribution function and moments of power sums with log-normal components , 1982, The Bell System Technical Journal.

[9]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[10]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  L. Fenton The Sum of Log-Normal Probability Distributions in Scatter Transmission Systems , 1960 .

[12]  Alessandro Vespignani,et al.  Explaining the uneven distribution of numbers in nature: the laws of Benford and Zipf , 2001 .

[13]  B. Schmeiser,et al.  Survival Distributions Satisfying Benford's Law , 2000 .

[14]  G. Somero,et al.  Hypoxia-induced gene expression profiling in the euryoxic fish Gillichthys mirabilis. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Simon Newcomb,et al.  Note on the Frequency of Use of the Different Digits in Natural Numbers , 1881 .

[16]  E. Southern,et al.  Molecular interactions on microarrays , 1999, Nature Genetics.

[17]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  A. Custovic,et al.  Apoptosis signals in atopy and asthma measured with cDNA arrays , 2001, Clinical and experimental immunology.

[19]  H Aburatani,et al.  Direct comparison of GeneChip and SAGE on the quantitative accuracy in transcript profiling analysis. , 2000, Genomics.

[20]  Scott A. Rifkin,et al.  Microarray analysis of Drosophila development during metamorphosis. , 1999, Science.

[21]  R. A. Raimi The First Digit Problem , 1976 .

[22]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[23]  A. Wagner,et al.  Decoupled evolution of coding region and mRNA expression patterns after gene duplication: implications for the neutralist-selectionist debate. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  T. Hill A Statistical Derivation of the Significant-Digit Law , 1995 .

[25]  Robert J. Schaffer,et al.  Microarray Analysis of Diurnal and Circadian-Regulated Genes in Arabidopsis , 2001, Plant Cell.

[26]  D. Botstein,et al.  Large-scale identification of secreted and membrane-associated gene products using DNA microarrays , 1999, Nature Genetics.

[27]  Harry Eugene Stanley,et al.  Econophysics: can physicists contribute to the science of economics? , 1999, Comput. Sci. Eng..

[28]  D. Sornette Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforganization and Disorder: Concepts and Tools , 2000 .

[29]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.