Imputing missing values in microarray data with ontology information

Microarray technology is a big step in bioinformatics. Hidden information within the large amounts of data provides scientists with molecular functions or essential biological meanings to study and analyze. However, these data often contain a certain portion of entities that are missing. Several methods to estimate these missing values are developed, but most of them are with disadvantages. In this paper, we propose a novel approach to deal with these missing values based on a practical similarity measurement between gene pairs. Our approach takes gene expression values and gene ontology (GO) information for genes into consideration. We implement our approach on a real microarray dataset and compare its imputation accuracy with other methods. Experimental results show that our approach can estimate missing values in microarray data effectively.

[1]  Kenji Satou,et al.  An Identification Method of Data-Specific GO Terms from a Microarray Data Set , 2009, IEICE Trans. Inf. Syst..

[2]  Azadeh Mohammadi,et al.  Estimating Missing Value in Microarray Data Using Fuzzy Clustering and Gene Ontology , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[3]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..

[4]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[5]  Hsin-Min Wang,et al.  A Query-by-Singing System for Retrieving Karaoke Music , 2008, IEEE Transactions on Multimedia.

[6]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[7]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[8]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  Vincent S. Tseng,et al.  Gene Relation Discovery by Mining Similar Subsequences in Time-Series Microarray Data , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[11]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[12]  Hui-Huang Hsu,et al.  Outlier Filtering for Identification of Gene Regulations in Microarray Time-Series Data , 2009, 2009 International Conference on Complex, Intelligent and Software Intensive Systems.

[13]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[14]  Xianhua Dai,et al.  Improving Missing Value Imputation in Microarray Data by Using Gene Regulatory Information , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.

[15]  Cesare Furlanello,et al.  Combining feature selection and DTW for time-varying functional genomics , 2006, IEEE Transactions on Signal Processing.