Use of biclustering for missing value imputation in gene expression data

DNA microarray data always contains missing values. As subsequent analysis such as biclustering can only be applied on complete data, these missing values have to be imputed before any biclusters can be detected. Existing imputation methods exploit coherence among expression values in the microarray data. In view that biclustering attempts to find correlated expression values within the data, we propose to combine the missing value imputation and biclustering into a single framework in which the two processes are performed iteratively. In this way, the missing value imputation can improve bicluster analysis and the coherence in detected biclusters can be exploited for better missing value estimation. Experiments have been conducted on artificial datasets and real datasets to verify the effectiveness of the proposed algorithm in reducing estimation errors of missing values.

[1]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[2]  Joachim Selbig,et al.  Non-linear PCA: a missing data approach , 2005, Bioinform..

[3]  Wan-Chi Siu,et al.  BiVisu: software tool for bicluster detection and visualization , 2007, Bioinform..

[4]  Ao Li,et al.  Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme , 2006, BMC Bioinformatics.

[5]  Graham R. Wood,et al.  A multi-stage approach to clustering and imputation of gene expression profiles , 2007, Bioinform..

[6]  Veena Vanchinathan,et al.  A gene-expression program reflecting the innate immune response of cultured intestinal epithelial cells to infection by Listeria monocytogenes , 2002, Genome Biology.

[7]  Hong Yan,et al.  Microarray missing data imputation based on a set theoretic framework and biological knowledge , 2006, Nucleic acids research.

[8]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[9]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[10]  Lusheng Wang,et al.  Computing the maximum similarity bi-clusters of gene expression data , 2007, Bioinform..

[11]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[12]  Haifeng Li,et al.  Integrative missing value estimation for microarray data , 2006, BMC Bioinformatics.

[13]  Shmuel Friedland,et al.  A simultaneous reconstruction of missing data in DNA microarrays , 2006 .

[14]  Guy N. Brock,et al.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes , 2008, BMC Bioinformatics.

[15]  Shmuel Friedland,et al.  An Algorithm for Missing Value Estimation for DNA Microarray Data , 2005, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[17]  R. John Linear Statistical Models: An Applied Approach , 1986 .

[18]  Alan Wee-Chung Liew,et al.  Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization , 2008, BMC Bioinformatics.

[19]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[20]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..

[21]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[22]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[23]  Tero Aittokallio,et al.  Missing value imputation improves clustering and interpretation of gene expression microarray data , 2008, BMC Bioinformatics.

[24]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Hong Yan,et al.  The theoretic framework of local weighted approximation for microarray missing value estimation , 2010, Pattern Recognit..

[26]  Luca Benini,et al.  Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[28]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[29]  Jiang Wang,et al.  Missing value imputation for microarray gene expression data using histone acetylation information , 2008, BMC Bioinformatics.

[30]  S R Gullans,et al.  DNA microarray analysis of complex biologic processes. , 2001, Journal of the American Society of Nephrology : JASN.

[31]  Edward R. Dougherty,et al.  Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study , 2010, EURASIP J. Bioinform. Syst. Biol..