Orthogonal projections to latent structures as a strategy for microarray data normalization

BackgroundDuring generation of microarray data, various forms of systematic biases are frequently introduced which limits accuracy and precision of the results. In order to properly estimate biological effects, these biases must be identified and discarded.ResultsWe introduce a normalization strategy for multi-channel microarray data based on orthogonal projections to latent structures (OPLS); a multivariate regression method. The effect of applying the normalization methodology on single-channel Affymetrix data as well as dual-channel cDNA data is illustrated. We provide a parallel comparison to a wide range of commonly employed normalization methods with diverse properties and strengths based on sensitivity and specificity from external (spike-in) controls. On the illustrated data sets, the OPLS normalization strategy exhibits leading average true negative and true positive rates in comparison to other evaluated methods.ConclusionThe OPLS methodology identifies joint variation within biological samples to enable the removal of sources of variation that are non-correlated (orthogonal) to the within-sample variation. This ensures that structured variation related to the underlying biological samples is separated from the remaining, bias-related sources of systematic variation. As a consequence, the methodology does not require any explicit knowledge regarding the presence or characteristics of certain biases. Furthermore, there is no underlying assumption that the majority of elements should be non-differentially expressed, making it applicable to specialized boutique arrays.

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .

[3]  Gary A. Churchill,et al.  Experimental design for three-color and four-color gene expression microarrays , 2005, ISMB.

[4]  Yee Hwa Yang,et al.  Normalization for two-color cDNA microarray data , 2003 .

[5]  J. Trygg O2‐PLS for qualitative and quantitative analysis in multivariate calibration , 2002 .

[6]  Rodolphe Barrangou,et al.  Global analysis of carbohydrate utilization by Lactobacillus acidophilus using cDNA microarrays. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Harm van Bakel,et al.  In control: systematic assessment of microarray performance , 2004, EMBO reports.

[8]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[9]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[12]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[13]  Johan Trygg Prediction and spectral profile estimation in multivariate calibration , 2004 .

[14]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[15]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[16]  Alicia Oshlack,et al.  Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes , 2007, Genome Biology.

[17]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[18]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[19]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[20]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[21]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[22]  M. Hessner,et al.  Three color cDNA microarrays: quantitative assessment through the use of fluorescein-labeled probes. , 2003, Nucleic acids research.

[23]  P. Nilsson,et al.  A genomic approach to investigate developmental cell death in woody tissues of Populus trees , 2005, Genome Biology.

[24]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[25]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[26]  Thorsten Forster,et al.  Triple-target microarray experiments: a novel experimental strategy , 2004, BMC Genomics.

[27]  Kai-Tai Fang,et al.  Use of three-color cDNA microarray experiments to assess the therapeutic and side effect of drugs , 2006 .

[28]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[29]  A. Poustka,et al.  Parameter estimation for the calibration and variance stabilization of microarray data , 2003, Statistical applications in genetics and molecular biology.

[30]  Wei Wu,et al.  Evaluation of normalization methods for cDNA microarray data by k-NN classification , 2005, BMC Bioinformatics.

[31]  M. Futschik,et al.  Model selection and efficiency testing for normalization of cDNA microarray data , 2004, Genome Biology.

[32]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[33]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[34]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[35]  B. Sundberg,et al.  A Populus EST resource for plant functional genomics. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[37]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.