Optimized LOWESS normalization parameter selection for DNA microarray data

BackgroundMicroarray data normalization is an important step for obtaining data that are reliable and usable for subsequent analysis. One of the most commonly utilized normalization techniques is the locally weighted scatterplot smoothing (LOWESS) algorithm. However, a much overlooked concern with the LOWESS normalization strategy deals with choosing the appropriate parameters. Parameters are usually chosen arbitrarily, which may reduce the efficiency of the normalization and result in non-optimally normalized data. Thus, there is a need to explore LOWESS parameter selection in greater detail.Results and discussionIn this work, we discuss how to choose parameters for the LOWESS method. Moreover, we present an optimization approach for obtaining the fraction of data points utilized in the local regression and analyze results for local print-tip normalization. The optimization procedure determines the bandwidth parameter for the local regression by minimizing a cost function that represents the mean-squared difference between the LOWESS estimates and the normalization reference level. We demonstrate the utility of the systematic parameter selection using two publicly available data sets. The first data set consists of three self versus self hybridizations, which allow for a quantitative study of the optimization method. The second data set contains a collection of DNA microarray data from a breast cancer study utilizing four breast cancer cell lines. Our results show that different parameter choices for the bandwidth window yield dramatically different calibration results in both studies.ConclusionsResults derived from the self versus self experiment indicate that the proposed optimization approach is a plausible solution for estimating the LOWESS parameters, while results from the breast cancer experiment show that the optimization procedure is readily applicable to real-life microarray data normalization. In summary, the systematic approach to obtain critical parameters in the LOWESS technique is likely to produce data that optimally meets assumptions made in the data preprocessing step and thereby makes studies utilizing the LOWESS method unambiguous and easier to repeat.

[1]  X. Wang,et al.  Quantitative quality control in microarray image processing and data acquisition. , 2001, Nucleic acids research.

[2]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[4]  Kevin Dobbin,et al.  Statistical Design of Reverse Dye Microarrays , 2003, Bioinform..

[5]  F. Grund Forsythe, G. E. / Malcolm, M. A. / Moler, C. B., Computer Methods for Mathematical Computations. Englewood Cliffs, New Jersey 07632. Prentice Hall, Inc., 1977. XI, 259 S , 1979 .

[6]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[7]  Michael L. Bittner,et al.  Ratio statistics of gene expression levels and applications to microarray data analysis , 2002, Bioinform..

[8]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[9]  Jaakko Astola,et al.  A novel strategy for microarray quality control using Bayesian networks , 2003, Bioinform..

[10]  Richard Simon,et al.  Questions and answers on design of dual-label microarrays for identifying differentially expressed genes. , 2003, Journal of the National Cancer Institute.

[11]  David Edwards,et al.  Non-linear Normalization and Background Correction in One-channel CDNA Microarray Studies , 2003, Bioinform..

[12]  David Venet,et al.  MatArray: a Matlab toolbox for microarray data , 2003, Bioinform..

[13]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[14]  Colin Campbell,et al.  Genome-wide screening for complete genetic loss in prostate cancer by comparative hybridization onto cDNA microarrays , 2003, Oncogene.

[15]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[16]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[17]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[18]  J. Michael Cherry,et al.  Microarray data quality analysis: lessons from the AFGC project , 2004, Plant Molecular Biology.

[19]  Kathleen Marchal,et al.  MARAN: Normalizing Micro-array Data , 2003, Bioinform..

[20]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[21]  Michael L. Bittner,et al.  Microarrays: Optical Technologies and Informatics , 2001 .

[22]  Marc J. Mazerolle,et al.  Detrimental effects of peat mining on amphibian abundance and species richness in bogs , 2003 .

[23]  Andrew J. Holloway,et al.  Options available—from start to finish—for obtaining data from DNA microarrays II , 2002, Nature Genetics.

[24]  M. Ringnér,et al.  Impact of DNA amplification on gene expression patterns in breast cancer. , 2002, Cancer research.

[25]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[26]  Michael A. Malcolm,et al.  Computer methods for mathematical computations , 1977 .

[27]  Anat Sakov,et al.  The dynamics of spatial behavior: how can robust smoothing techniques help? , 2004, Journal of Neuroscience Methods.

[28]  A. John Mallinckrodt,et al.  Data Reduction and Error Analysis for the Physical Sciences , 1993 .

[29]  Aled M. Edwards,et al.  Unfolding of Microarray Data , 2002, J. Comput. Biol..

[30]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[31]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[32]  Paul H. C. Eilers,et al.  Enhancing scatterplots with smoothed densities , 2004, Bioinform..

[33]  Paul S Albert,et al.  Using lowess to remove systematic trends over time in predictor variables prior to logistic regression with quantile categories , 2003, Statistics in medicine.

[34]  Karen A. F. Copeland Local Polynomial Modelling and its Applications , 1997 .

[35]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[36]  Kathleen Marchal,et al.  MARAN : normalizing microarray data , 2003 .

[37]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[38]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[39]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.

[40]  Dale L. Wilson,et al.  New Normalization Methods for CDNA Microarray Data , 2003, Bioinform..

[41]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[42]  Petri Auvinen,et al.  Are data from different gene expression microarray platforms comparable? , 2004, Genomics.

[43]  Michael L. Bittner,et al.  Comprehensive copy number and gene expression profiling of the 17q23 amplicon in human breast cancer , 2001, Proceedings of the National Academy of Sciences of the United States of America.