Shrinkage regression-based methods for microarray missing value imputation

BackgroundMissing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets.ResultsTo further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do.ConclusionsImputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.

[1]  P. Brown,et al.  New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiae revealed by genomic expression analysis. , 2000, Molecular biology of the cell.

[2]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[3]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[4]  Wei-Sheng Wu,et al.  Identifying gene regulatory modules of heat shock response in yeast , 2008, BMC Genomics.

[5]  Bor-Sen Chen,et al.  Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data , 2007, BMC Bioinformatics.

[6]  Andrzej Kudlicki,et al.  High-resolution timing of cell cycle-regulated gene expression , 2007, Proceedings of the National Academy of Sciences.

[7]  Guy N. Brock,et al.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes , 2008, BMC Bioinformatics.

[8]  Kara Dolinski,et al.  Homeostatic adjustment and metabolic remodeling in glucose-limited yeast cultures. , 2005, Molecular biology of the cell.

[9]  David Botstein,et al.  Disruption of yeast forkhead-associated cell cycle transcription by oxidative stress. , 2004, Molecular biology of the cell.

[10]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[11]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[12]  Wei-Sheng Wu,et al.  Yeast cell cycle transcription factors identification by variable selection criteria. , 2011, Gene.

[13]  Tianwei Yu,et al.  Incorporating Nonlinear Relationships in Microarray Missing Value Imputation , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Wen-Hsiung Li,et al.  Systematic identification of yeast cell cycle transcription factors using multiple data sources , 2008, BMC Bioinformatics.

[15]  Matthias E. Futschik,et al.  Are we Overestimating the Number of Cell-Cycling Genes? The Impact of Background Models , 2008, German Conference on Bioinformatics.

[16]  Bor-Sen Chen,et al.  Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle , 2006, BMC Bioinformatics.

[17]  Xiaofeng Song,et al.  Sequential local least squares imputation estimating missing value of microarray data , 2008, Comput. Biol. Medicine.

[18]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[19]  Hsiuying Wang Brown's paradox in the estimated confidence approach , 1999 .

[20]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[21]  Guohui Lin,et al.  Iterated Local Least Squares Microarray Missing Value Imputation , 2006, J. Bioinform. Comput. Biol..

[22]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[23]  Hsiuying Wang Improved confidence estimators for the multivariate normal confidence set , 2000 .

[24]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[25]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[26]  David Botstein,et al.  Variation in gene expression patterns in follicular lymphoma and the response to rituximab , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Joshua N. Ash,et al.  Model-Based Deconvolution of Cell Cycle Time-Series Data Reveals Gene Expression Details at High Resolution , 2009, PLoS Comput. Biol..

[28]  A. Malpertuy,et al.  Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments , 2010, BMC Genomics.

[29]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[30]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..