GenXHC: a probabilistic generative model for cross-hybridization compensation in high-density genome-wide microarray data

MOTIVATION Microarray designs containing millions to hundreds of millions of probes that tile entire genomes are currently being released. Within the next 2 months, our group will release a microarray data set containing over 12,000,000 microarray measurements taken from 37 mouse tissues. A problem that will become increasingly significant in the upcoming era of genome-wide exon-tiling microarray experiments is the removal of cross-hybridization noise. We present a probabilistic generative model for cross-hybridization in microarray data and a corresponding variational learning method for cross-hybridization compensation, GenXHC, that reduces cross-hybridization noise by taking into account multiple sources for each mRNA expression level measurement, as well as prior knowledge of hybridization similarities between the nucleotide sequences of microarray probes and their target cDNAs. RESULTS The algorithm is applied to a subset of an exon-resolution genome-wide Agilent microarray data set for chromosome 16 of Mus musculus and is found to produce statistically significant reductions in cross-hybridization noise. The denoised data is found to produce enrichment in multiple gene ontology-biological process (GO-BP) functional groups. The algorithm is found to outperform robust multi-array analysis, another method for cross-hybridization compensation.

[1]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[2]  Thomas E. Royce,et al.  Global Identification of Human Transcribed Sequences with Genome Tiling Arrays , 2004, Science.

[3]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[4]  Seongjoon Koo,et al.  Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. , 2004, Nucleic acids research.

[5]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Castle,et al.  Genome-Wide Survey of Human Alternative Pre-mRNA Splicing with Exon Junction Microarrays , 2003, Science.

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[9]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[10]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[11]  Fuad G. Gwadry,et al.  Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI-60 cancer cells , 2003, Genome Biology.

[12]  R. Stoughton,et al.  Experimental annotation of the human genome using microarray technology , 2001, Nature.

[13]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[14]  Brendan J. Frey,et al.  Spatial Bias Removal in Microarray Images , 2003 .

[15]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[16]  Jonathan D Wren,et al.  Cross-hybridization on PCR-spotted microarrays. , 2002, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[17]  Brendan J. Frey,et al.  Finding Novel Transcripts in High-Resolution Genome-Wide Microarray Data Using the GenRate Model , 2005, RECOMB.

[18]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[19]  Rafael A. Irizarry,et al.  Stochastic models inspired by hybridization theory for short oligonucleotide arrays , 2004, J. Comput. Biol..

[20]  Christopher J. Lee,et al.  Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data. , 2004, Nucleic acids research.

[21]  Yudong D. He,et al.  Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer , 2001, Nature Biotechnology.

[22]  J. SantaLucia,et al.  Improved nearest-neighbor parameters for predicting DNA duplex stability. , 1996, Biochemistry.

[23]  Gary D. Stormo,et al.  Selection of optimal DNA oligos for gene expression arrays , 2001, Bioinform..