A review of statistical methods for preprocessing oligonucleotide microarrays

Microarrays have become an indispensable tool in biomedical research. This powerful technology not only makes it possible to quantify a large number of nucleic acid molecules simultaneously, but also produces data with many sources of noise. A number of preprocessing steps are therefore necessary to convert the raw data, usually in the form of hybridisation images, to measures of biological meaning that can be used in further statistical analysis. Preprocessing of oligonucleotide arrays includes image processing, background adjustment, data normalisation/transformation and sometimes summarisation when multiple probes are used to target one genomic unit. In this article, we review the issues encountered in each preprocessing step and introduce the statistical models and methods in preprocessing.

[1]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[2]  N. Sugimoto,et al.  Thermodynamic parameters to predict stability of RNA/DNA hybrid duplexes. , 1995, Biochemistry.

[3]  K. Aldape,et al.  A model of molecular interactions on short oligonucleotide microarrays , 2003, Nature Biotechnology.

[4]  Felix Naef,et al.  Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[6]  John Quackenbush,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm043 Gene , 2022 .

[7]  Javier Cabrera,et al.  Analysis of Data From Viral DNA Microchips , 2001 .

[8]  David M. Rocke,et al.  Approximate Variance-stabilizing Transformations for Gene-expression Microarray Data , 2003, Bioinform..

[9]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[10]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[11]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[12]  Cavan S Reilly,et al.  A Method for Normalizing Microarrays Using Genes That Are Not Differentially Expressed , 2003 .

[13]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Yudi Pawitan,et al.  Normalization of oligonucleotide arrays based on the least-variant set of genes , 2008, BMC Bioinformatics.

[15]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[16]  Rafael A. Irizarry,et al.  Stochastic Models Inspired by Hybridization Theory for Short Oligonucleotide Arrays , 2005, J. Comput. Biol..

[17]  N. Patil,et al.  DNA hybridization to mismatched templates: a chip study. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Felix Naef,et al.  Empirical characterization of the expression ratio noise structure in high-density oligonucleotide arrays , 2002, Genome Biology.

[19]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[20]  Yi Xing,et al.  Exon arrays provide accurate assessments of gene expression , 2007, Genome Biology.

[21]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[22]  Magnus Åstrand,et al.  Contrast Normalization of Oligonucleotide Arrays , 2003, J. Comput. Biol..

[23]  A. Poustka,et al.  Parameter estimation for the calibration and variance stabilization of microarray data , 2003, Statistical applications in genetics and molecular biology.

[24]  Mayte Suárez-Fariñas,et al.  Harshlight: a "corrective make-up" program for microarray chips , 2005, BMC Bioinformatics.

[25]  Clifford A. Meyer,et al.  Model-based analysis of tiling-arrays for ChIP-chip , 2006, Proceedings of the National Academy of Sciences.

[26]  William B. Langdon,et al.  An overview of image-processing methods for Affymetrix GeneChips , 2007, Briefings Bioinform..

[27]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[28]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[29]  W. Huber,et al.  Model-based variance-stabilizing transformation for Illumina microarray data , 2008, Nucleic acids research.

[30]  Rafael A. Irizarry,et al.  A statistical framework for the analysis of microarray probe-level data , 2007, 0712.2115.