A Bicluster-Based Bayesian Principal Component Analysis Method for Microarray Missing Value Estimation

Data generated from microarray experiments often suffer from missing values. As most downstream analyses need full matrices as input, these missing values have to be estimated. Bayesian principal component analysis (BPCA) is a well-known microarray missing value estimation method, but its performance is not satisfactory on datasets with strong local similarity structure. A bicluster-based BPCA (bi-BPCA) method is proposed in this paper to fully exploit local structure of the matrix. In a bicluster, the most correlated genes and experimental conditions with the missing entry are identified, and BPCA is conducted on these biclusters to estimate the missing values. An automatic parameter learning scheme is also developed to obtain optimal parameters. Experimental results on four real microarray matrices indicate that bi-BPCA obtains the lowest normalized root-mean-square error on 82.14% of all missing rates.

[1]  Guy N. Brock,et al.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes , 2008, BMC Bioinformatics.

[2]  Jiang Ruirui,et al.  A Bicluster-Based Missing Value Imputation Method for Gene Expression Data , 2011 .

[3]  Xiaofeng Song,et al.  Sequential local least squares imputation estimating missing value of microarray data , 2008, Comput. Biol. Medicine.

[4]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[5]  Lígia P. Brás,et al.  Dealing with gene expression missing data. , 2006, Systems biology.

[6]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[7]  William-Chandra Tjhi,et al.  A partitioning based algorithm to fuzzy co-cluster documents and words , 2006, Pattern Recognit. Lett..

[8]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[9]  Wai-Ki Ching,et al.  A weighted Local Least Squares Imputation method for missing value estimation in microarray gene expression data , 2010, Int. J. Data Min. Bioinform..

[10]  Hong Yan,et al.  Hypergraph based geometric biclustering algorithm , 2012, Pattern Recognit. Lett..

[11]  David Botstein,et al.  Transcriptional response of steady-state yeast cultures to transient perturbations in carbon source. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[13]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[14]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[15]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[16]  Guohui Lin,et al.  Iterated Local Least Squares Microarray Missing Value Imputation , 2006, J. Bioinform. Comput. Biol..

[17]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[18]  Shyama Das,et al.  Application of Cardinality based GRASP to the Biclustering of Gene Expression Data , 2010 .

[19]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[20]  D. Botstein,et al.  Genome-wide Analysis of Gene Expression Regulated by the Calcineurin/Crz1p Signaling Pathway in Saccharomyces cerevisiae * , 2002, The Journal of Biological Chemistry.

[21]  Hong Yan,et al.  Microarray missing data imputation based on a set theoretic framework and biological knowledge , 2006, Nucleic acids research.

[22]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[23]  Hong-Bin Shen,et al.  Towards better accuracy for missing value estimation of epistatic miniarray profiling data by a novel ensemble approach. , 2011, Genomics.

[24]  P. Brown,et al.  New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiae revealed by genomic expression analysis. , 2000, Molecular biology of the cell.

[25]  E. Dougherty,et al.  Multivariate measurement of gene expression relationships. , 2000, Genomics.

[26]  Ming Ouyang,et al.  DNA microarray data imputation and significance analysis of differential expression , 2005, Bioinform..

[27]  Wan-Chi Siu,et al.  Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data , 2012, Pattern Recognit..

[28]  Veena Vanchinathan,et al.  A gene-expression program reflecting the innate immune response of cultured intestinal epithelial cells to infection by Listeria monocytogenes , 2002, Genome Biology.

[29]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[30]  Hong Yan,et al.  Autoregressive-Model-Based Missing Value Estimation for DNA Microarray Time Series Data , 2009, IEEE Transactions on Information Technology in Biomedicine.

[31]  S. Ishii,et al.  Identification of expressed genes linked to malignancy of human colorectal carcinoma by parametric clustering of quantitative expression data , 2003, Genome Biology.

[32]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.