Towards better accuracy for missing value estimation of epistatic miniarray profiling data by a novel ensemble approach.

Epistatic miniarray profiling (E-MAP) is a powerful tool for analyzing gene functions and their biological relevance. However, E-MAP data suffers from large proportion of missing values, which often results in misleading and biased analysis results. It is urgent to develop effective missing value estimation methods for E-MAP. Although several independent algorithms can be applied to achieve this goal, their performance varies significantly on different datasets, indicating different algorithms having their own advantages and disadvantages. In this paper, we propose a novel ensemble approach EMDI based on the high-level diversity to impute missing values that consists of two global and four local base estimators. Experimental results on five E-MAP datasets show that EMDI outperforms all single base algorithms, demonstrating an appropriate combination providing complementarity among different methods. Comparison results between several fusion strategies also demonstrate that the proposed high-level diversity scheme is superior to others. EMDI is freely available at www.csbio.sjtu.edu.cn/bioinf/EMDI/.

[1]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[2]  Tero Aittokallio,et al.  Predicting Gene Expression from Combined Expression and Promoter Profile Similarity with Application to Missing Value Imputation , 2007 .

[3]  Loris Nanni,et al.  Random subspace for an improved BioHashing for face authentication , 2008, Pattern Recognit. Lett..

[4]  Grant W. Brown,et al.  Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map , 2007, Nature.

[5]  Sean R. Collins,et al.  Conservation and Rewiring of Functional Modules Revealed by an Epistasis Map in Fission Yeast , 2008, Science.

[6]  Sean R. Collins,et al.  Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile , 2005, Cell.

[7]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[8]  Derek Greene,et al.  Missing value imputation for epistatic MAPs , 2010, BMC Bioinformatics.

[9]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[10]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[11]  Sean R. Collins,et al.  Functional Organization of the S. cerevisiae Phosphorylation Network , 2009, Cell.

[12]  Witold Pedrycz,et al.  Experimental analysis of methods for imputation of missing values in databases , 2004, SPIE Defense + Commercial Sensing.

[13]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[15]  Wenjia Wang,et al.  On diversity and accuracy of homogeneous and heterogeneous ensembles , 2007, Int. J. Hybrid Intell. Syst..

[16]  Hu Fu,et al.  Identifications of conserved 7-mers in 3'-UTRs and microRNAs in Drosophila , 2007, BMC Bioinformatics.

[17]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[18]  Stephen P. Boyd,et al.  Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices , 2003, Proceedings of the 2003 American Control Conference, 2003..

[19]  Loris Nanni,et al.  RegionBoost learning for 2D+3D based face recognition , 2007, Pattern Recognit. Lett..

[20]  Shoshana J. Wodak,et al.  Local coherence in genetic interaction patterns reveals prevalent functional versatility , 2008, Bioinform..

[21]  Loris Nanni,et al.  A genetic approach for building different alphabets for peptide and protein classification , 2008, BMC Bioinformatics.

[22]  Shmuel Friedland,et al.  A simultaneous reconstruction of missing data in DNA microarrays , 2006 .

[23]  Guy N. Brock,et al.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes , 2008, BMC Bioinformatics.

[24]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[25]  Sean R. Collins,et al.  A genetic interaction map of RNA-processing factors reveals links between Sem1/Dss1-containing complexes and mRNA export and splicing. , 2008, Molecular cell.

[26]  Gavin Brown,et al.  Learn++.MF: A random subspace approach for the missing feature problem , 2010, Pattern Recognit..

[27]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..

[28]  Loris Nanni,et al.  An ensemble of K-local hyperplanes for predicting protein-protein interactions , 2006, Bioinform..

[29]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[30]  Loris Nanni,et al.  Experimental comparison of one-class classifiers for online signature verification , 2006, Neurocomputing.

[31]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[32]  Witold Pedrycz,et al.  A Novel Framework for Imputation of Missing Values in Databases , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[33]  Tero Aittokallio,et al.  Missing value imputation improves clustering and interpretation of gene expression microarray data , 2008, BMC Bioinformatics.

[34]  Loris Nanni,et al.  Evolved Feature Weighting for Random Subspace Classifier , 2008, IEEE Transactions on Neural Networks.

[35]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[36]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[37]  Tero Aittokallio,et al.  Dealing with missing values in large-scale studies: microarray data imputation and beyond , 2010, Briefings Bioinform..