Triple imputation for microarray missing value estimation

Data obtained from gene expression microarray experiments always suffer from missing values due to various reasons. However, complete gene expression data are of great importance to many gene expression data analysis issues. Therefore, imputation methods with high estimation precision are critical to further data analysis. In this paper, inspired by the idea of semi-supervised learning with tri-training, we propose a novel imputation method called TRIIM (TRIple IMputation). TRIIM estimates missing values using triple imputation strategies based on Bayesian principal component analysis (BPCA), local least squares (LLS) and expectation maximization (EM). The data properties of global correlation information, local structure and data distribution are all considered properly. It is implemented by sharing the estimated values of any two algorithms' cooperation to the rest at each step, and assembling combinations of all imputation results finally. Experimental results on four real microarray matrices demonstrate that TRIIM achieves better performance than the comparative algorithms in terms of normalized root mean square error (NRMSE), even in the case of microarray dataset with large missing rates and few complete genes.

[1]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[2]  Hong Yan,et al.  A Bicluster-Based Bayesian Principal Component Analysis Method for Microarray Missing Value Estimation , 2014, IEEE Journal of Biomedical and Health Informatics.

[3]  Hui-Hui Li,et al.  Semi-supervised imputation for microarray missing value estimation , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  竹政 伊知朗,et al.  Construction of Preferential cDNA Microarray Specialized for Human Colorectal Carcinoma : Molecular Sketch of Colorectal Cancer , 2002 .

[5]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[6]  Terence P. Speed,et al.  Comparison of Methods for Image Analysis on cDNA Microarray Data , 2002 .

[7]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[8]  Azadeh Mohammadi,et al.  Estimating Missing Value in Microarray Data Using Fuzzy Clustering and Gene Ontology , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[9]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[10]  Guy N. Brock,et al.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes , 2008, BMC Bioinformatics.

[11]  Jiang Ruirui,et al.  A Bicluster-Based Missing Value Imputation Method for Gene Expression Data , 2011 .

[12]  Wan-Chi Siu,et al.  Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data , 2012, Pattern Recognit..

[13]  Hong-Bin Shen,et al.  Towards better accuracy for missing value estimation of epistatic miniarray profiling data by a novel ensemble approach. , 2011, Genomics.

[14]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[15]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[16]  Ming Ouyang,et al.  DNA microarray data imputation and significance analysis of differential expression , 2005, Bioinform..

[17]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[18]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[19]  Md Zahidul Islam,et al.  FIMUS: A framework for imputing missing values using co-appearance, correlation and similarity analysis , 2014, Knowl. Based Syst..

[20]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.