Noise Analysis of Duplicated Data on Microarrays Using Mixture Distribution Modeling

We propose a technique for estimating gene expression values for duplicated data on cDNA microarrays. In the scatter plots, the distribution is constructed from a mixture of normal two-dimensional distributions, which represent fluctuations in gene expression values due to noise. An expectation-maximization (EM) algorithm is used for estimating the modeling parameters. The probability that duplicated data is shifted by noise is calculated using Bayesian estimation. Six data sets of rice cDNA microarray assays were used to test the proposed technique. Genes in the data sets were subjected to clustering based on probability of true value. Clustering successfully identified candidate genes regulated by circadian rhythms in rice.

[1]  Ron O. Dror,et al.  Bayesian Estimation of Transcript Levels Using a General Model of Array Measurement Noise , 2003, J. Comput. Biol..

[2]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[3]  Yoganand Balagurunathan,et al.  Simulation of cDNA microarrays via a parameterized random signal model. , 2002, Journal of biomedical optics.

[4]  Jae K. Lee,et al.  Bayesian hierarchical error model for analysis of gene expression data , 2004, Bioinform..

[5]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[6]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[8]  T. Sasaki,et al.  Embarking on rice functional genomics via cDNA microarray: use of 3' UTR probes for specific gene expression analysis. , 2000, DNA research : an international journal for rapid publication of reports on genes and genomes.

[9]  Tommi S. Jaakkola,et al.  Maximum-likelihood estimation of optimal scaling factors for expression array normalization , 2001, SPIE BiOS.

[10]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[11]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[12]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[13]  D. Titterington Some recent research in the analysis of mixture distributions , 1990 .

[14]  C. Auffray,et al.  Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA array. , 1996, Genome research.