An Algorithm for Missing Value Estimation for DNA Microarray Data

Gene expression data matrices often contain missing expression values. In this paper, we describe a new algorithm, named improved fixed rank approximation algorithm (IFRAA), for missing values estimations of the large gene expression data matrices. We compare the present algorithm with the two existing and widely used methods for reconstructing missing entries for DNA microarray gene expression data: the Bayesian principal component analysis (BPCA) and the local least squares imputation method (LLS). The three algorithms were applied to four microarray data sets and two synthetic low-rank data matrices. Certain percentages of the elements of these data sets were randomly deleted, and the three algorithms were used to recover them. In conclusion IFRAA appears to be the most reliable and accurate approach for recovering missing DNA microarray gene expression data, or any other noisy data matrices that are effectively low rank

[1]  H GolubGene,et al.  Missing value estimation for DNA microarray gene expression data , 2005 .

[2]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[3]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[4]  D. Botstein,et al.  Genome-wide Analysis of Gene Expression Regulated by the Calcineurin/Crz1p Signaling Pathway in Saccharomyces cerevisiae * , 2002, The Journal of Biological Chemistry.

[5]  Hong Yan,et al.  Missing microarray data estimation based on projection onto convex sets method , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  David Botstein,et al.  Processing and modeling genome-wide expression data using singular value decomposition , 2001, SPIE BiOS.

[7]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Dimitrios Vogiatzis,et al.  Missing Value Estimation for DNA Microarrays with Mutliresolution Schemes , 2006, ICANN.

[10]  Shmuel Friedland,et al.  A simultaneous reconstruction of missing data in DNA microarrays , 2006 .

[11]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[12]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[13]  D. Botstein,et al.  Systematic changes in gene expression patterns following adaptive evolution in yeast. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.